Pricing — InferGrove

Free

For experimentation and prototyping

$0/month

1,000 requests per day
Access to 50+ models
10 requests/second rate limit
Community support (Discord)
Basic usage dashboard
Single API key

Get started free

Pro

For individual developers and startups

$0.20/1M input tokens

Unlimited requests
Access to 200+ models
1,000 requests/second rate limit
Email support (24h response)
Fine-tuning access
Batch processing (50% off)
Advanced analytics
Multiple API keys

Start building →

Team

For growing teams and companies

$0.15/1M input tokens

Everything in Pro
25% volume discount
5,000 requests/second rate limit
Priority support (4h response)
Team management & RBAC
SSO / SAML integration
Dedicated account manager
Custom model deployment
99.9% uptime SLA

Start free trial →

Enterprise

For large-scale production deployments

Custom

Everything in Team
Dedicated GPU clusters
Unlimited rate limits
24/7 dedicated support
VPC peering & private endpoints
SOC 2 Type II & HIPAA
Custom SLA (up to 99.99%)
On-premise deployment option
Custom model training
Dedicated solutions engineer

Contact sales

Model Pricing

Per-token pricing by model

Prices shown per 1 million tokens. Output tokens are typically 3-4x the cost of input tokens.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window	Throughput
Llama 4 Maverick 400B	$0.80	$2.40	1M tokens	120 tok/s
Llama 4 Scout 109B	$0.20	$0.60	512K tokens	340 tok/s
DeepSeek V3 685B	$1.00	$3.00	128K tokens	95 tok/s
Mixtral 8x22B v0.3	$0.12	$0.36	64K tokens	480 tok/s
Qwen 3 72B Instruct	$0.15	$0.45	128K tokens	410 tok/s
Gemma 3 27B	$0.07	$0.21	32K tokens	620 tok/s
Llama 4 Scout 17B	$0.04	$0.12	128K tokens	890 tok/s
Stable Diffusion 4	$0.003 per image (1024x1024)		—	1.2s/image
FLUX.2 Pro	$0.005 per image (1024x1024)		—	0.8s/image
BGE-M3 (Embeddings)	$0.01	—	8K tokens	10K tok/s

Dedicated Clusters

Dedicated GPU pricing

For workloads that require guaranteed capacity and isolation. Billed hourly with monthly commitment discounts.

Configuration	GPUs	Memory	On-Demand ($/hr)	Reserved 1yr ($/hr)
Small	8x H100	640 GB	$24.00	$16.80 (30% off)
Medium	32x H100	2.5 TB	$92.00	$64.40 (30% off)
Large	64x H100	5 TB	$180.00	$126.00 (30% off)
XL	128x H100	10 TB	$350.00	$245.00 (30% off)
H200 Small	8x H200	1.1 TB	$36.00	$25.20 (30% off)
H200 Large	64x H200	9 TB	$272.00	$190.40 (30% off)

All dedicated clusters include: managed infrastructure, automatic failover, 24/7 monitoring, and dedicated support. Custom configurations available for Enterprise customers.

Compare Plans

Feature comparison

See exactly what's included in each plan.

Feature	Free	Pro	Team	Enterprise
Models available	50+	200+	200+	200+ & custom
Rate limit	10 req/s	1,000 req/s	5,000 req/s	Unlimited
Daily request limit	1,000	Unlimited	Unlimited	Unlimited
Fine-tuning	—	✓	✓	✓ (priority)
Batch processing	—	✓ (50% off)	✓ (50% off)	✓ (custom)
AI Agents	Basic	✓	✓	✓ (advanced)
Function calling	✓	✓	✓	✓
Structured output	✓	✓	✓	✓
Team management	—	—	✓	✓
SSO / SAML	—	—	✓	✓
VPC peering	—	—	—	✓
Dedicated clusters	—	—	—	✓
SLA	—	99.5%	99.9%	99.99%
Support	Community	Email (24h)	Priority (4h)	24/7 dedicated
SOC 2 report	—	—	✓	✓
HIPAA BAA	—	—	—	✓

FAQ

Frequently asked questions

How does pay-per-token pricing work?

You're charged based on the number of tokens processed. Input tokens (your prompts) and output tokens (model responses) are priced separately. 1 token ≈ 4 characters in English. You only pay for what you use — no monthly minimums or commitments.

What's included in the free tier?

The free tier includes 1,000 requests per day across 50+ models, with a rate limit of 10 requests per second. It's perfect for prototyping and experimentation. No credit card required to sign up.

Can I switch plans at any time?

Yes, you can upgrade or downgrade at any time. When upgrading, you get immediate access to new features. When downgrading, changes take effect at the end of your current billing cycle. There are no cancellation fees.

How does batch processing pricing work?

Batch processing offers a 50% discount on per-token pricing in exchange for asynchronous processing. Jobs are completed within your specified time window (typically 1-24 hours). This is ideal for non-time-sensitive workloads like data labeling or content generation.

What payment methods do you accept?

We accept all major credit cards (Visa, Mastercard, American Express), ACH bank transfers, and wire transfers for Enterprise customers. We also support billing through AWS Marketplace and GCP Marketplace.

Do you offer volume discounts?

Yes! The Team plan includes a 25% volume discount automatically. For Enterprise customers, we offer custom pricing based on committed usage. Contact our sales team for a custom quote tailored to your needs.

Is there a spending limit or budget cap?

Yes, you can set monthly spending limits in your dashboard. Once reached, requests will be rejected (not throttled) to prevent unexpected charges. You'll receive alerts at 50%, 80%, and 100% of your budget.

What's the difference between dedicated clusters and serverless?

Serverless inference shares GPU resources across customers and charges per-token. Dedicated clusters provide isolated GPU hardware exclusively for your workloads with guaranteed capacity, lower latency, and custom configurations. Dedicated clusters are billed hourly based on GPU count.

Simple, transparent pricing