Born from frustration with slow, expensive AI

InferGrove was founded in 2024 by a team of systems engineers and ML researchers who were frustrated by the state of AI inference. Running open-source models was either painfully slow, prohibitively expensive, or required deep infrastructure expertise.

We believed that the future of AI would be open-source, and that the bottleneck wasn't model quality โ€” it was inference speed and cost. So we set out to build the fastest, most efficient inference platform in the world.

Today, InferGrove serves over 50,000 developers and processes 2.4 million requests per second. Our custom inference engine, built from the ground up with optimized CUDA kernels and speculative decoding, delivers 2-5x better performance than alternatives.

50K+
Developers
2.4M
Requests/sec
200+
Models
$180M
Total Funding

What drives us

๐Ÿš€

Speed is a Feature

Every millisecond matters. We obsess over latency at every layer of the stack โ€” from CUDA kernels to network routing. Fast inference unlocks new use cases and better user experiences.

๐ŸŒ

Open Source First

We believe the future of AI is open. We contribute to open-source projects, publish our research, and build tools that make open models competitive with proprietary alternatives.

๐Ÿ”ฌ

Research-Driven

Our team publishes at top venues (NeurIPS, ICML, MLSys) and our production systems incorporate the latest advances in model optimization, quantization, and serving.

๐Ÿค

Developer Empathy

We're developers building for developers. Every API design decision, every error message, every piece of documentation is crafted with the developer experience in mind.

๐Ÿ”’

Trust & Transparency

Your data is yours. We never train on customer data, we publish our security practices, and we're transparent about our pricing, performance, and limitations.

โ™ป๏ธ

Efficiency Matters

More efficient inference means lower costs and lower environmental impact. We're committed to reducing the compute required per token through better algorithms and hardware utilization.

Meet the team

World-class engineers and researchers from Google, Meta, NVIDIA, and top research labs.

AK

Alex Kowalski

Co-founder & CEO

Previously: Staff Engineer at Google DeepMind. PhD in Distributed Systems, MIT.

SP

Sophia Park

Co-founder & CTO

Previously: Principal Engineer at NVIDIA (CUDA team). PhD in Computer Architecture, Stanford.

RJ

Raj Jayaraman

VP of Engineering

Previously: Director of Infrastructure at Meta AI. Built PyTorch serving infrastructure.

EC

Elena Chen

Head of Research

Previously: Research Scientist at Google Brain. 40+ papers on model optimization. PhD, Berkeley.

DM

David Mueller

VP of Product

Previously: Product Lead at Vercel. Built developer tools used by millions.

LW

Lisa Wang

VP of Sales

Previously: Enterprise Sales at Databricks. Grew ARR from $50M to $500M.

MO

Marcus Okonkwo

Head of Security

Previously: Security Lead at Stripe. Built zero-trust infrastructure for financial systems.

YT

Yuki Tanaka

Head of ML Infrastructure

Previously: Senior Staff at Google (TPU team). Expert in distributed training systems.

Company timeline

March 2026

Series C โ€” $120M

Led by Andreessen Horowitz with participation from NVIDIA Ventures. Valued at $2.4B. Expanding to 12 global regions.

October 2025

200+ Models & 50K Developers

Reached 50,000 active developers. Launched AI Agents platform, batch processing, and fine-tuning capabilities.

June 2025

Series B โ€” $45M

Led by Sequoia Capital. Launched dedicated GPU clusters and enterprise tier. Achieved 99.99% uptime.

January 2025

Public Launch

Launched publicly with 100+ models. Reached 10,000 developers in first month. Published SpecServe paper at MLSys.

August 2024

Series A โ€” $15M

Led by Index Ventures. Hired founding team of 20 engineers from Google, Meta, and NVIDIA.

March 2024

Founded

Alex Kowalski and Sophia Park founded InferGrove in San Francisco with a mission to make AI inference fast and affordable.

Our investors

Supported by the world's leading technology investors.

Andreessen Horowitz
Sequoia Capital
Index Ventures
NVIDIA Ventures
Greylock

How we work

We're a remote-first company with a culture built on trust, ownership, and technical excellence.

๐Ÿ 

Remote-First

Work from anywhere in the world. We have team members across 12 countries and 8 time zones. Offices in SF, NYC, and London for those who prefer in-person collaboration.

โšก

Ship Fast

We deploy to production multiple times per day. Small teams with full ownership move fast. We value iteration speed over perfection โ€” ship, measure, improve.

๐Ÿ“–

Learn Constantly

$5,000 annual learning budget. Conference attendance encouraged. Weekly tech talks and paper reading groups. Dedicated time for research and open-source.

๐Ÿฅ

Comprehensive Benefits

Premium health, dental, and vision insurance. 401(k) with 4% match. Generous parental leave. Mental health support and wellness stipend.

๐Ÿ“ˆ

Meaningful Equity

Every employee receives meaningful equity. We believe everyone who contributes to building InferGrove should share in its success.

๐ŸŒด

Flexible Time Off

Unlimited PTO with a minimum of 4 weeks encouraged. We trust you to manage your time. Company-wide shutdown weeks in summer and winter.

Giving back to the community

We believe in open source and actively contribute to the ecosystem that makes our work possible.

InferEngine

Our open-source inference engine with custom CUDA kernels. Used by 5,000+ developers for self-hosted deployments. Apache 2.0 licensed.

โญ 12.4K stars on GitHub

QuantForge

Our calibration-free quantization toolkit. Quantize any model to INT4/INT8 without calibration data. Published at ICML 2026.

โญ 4.2K stars on GitHub

FlashInfer

Adaptive KV-cache compression library. Enables 10x longer contexts with minimal quality loss. Published at NeurIPS 2026.

โญ 3.8K stars on GitHub

InferGrove at a glance

120+
Team Members
12
Countries
15
Research Papers
3
Offices

In the news

TechCrunch ยท March 2026

"InferGrove raises $120M to challenge cloud giants in AI inference"

The startup's custom inference engine delivers 2-5x better performance than AWS and Azure alternatives.

The Information ยท January 2026

"The inference wars: How InferGrove is winning developers"

With 50,000 developers and growing, InferGrove has become the go-to platform for open-source model deployment.

VentureBeat ยท October 2025

"InferGrove's SpecServe paper shows 2.8x inference speedup"

The company's research on speculative decoding is now deployed in production, serving millions of requests daily.

Join us in building the future of AI

We're hiring world-class engineers and researchers. Come build the fastest inference platform on earth.