Free

For experimentation and prototyping

$0/month
  • 1,000 requests per day
  • Access to 50+ models
  • 10 requests/second rate limit
  • Community support (Discord)
  • Basic usage dashboard
  • Single API key
Get started free

Pro

For individual developers and startups

$0.20/1M input tokens
  • Unlimited requests
  • Access to 200+ models
  • 1,000 requests/second rate limit
  • Email support (24h response)
  • Fine-tuning access
  • Batch processing (50% off)
  • Advanced analytics
  • Multiple API keys
Start building →

Enterprise

For large-scale production deployments

Custom
  • Everything in Team
  • Dedicated GPU clusters
  • Unlimited rate limits
  • 24/7 dedicated support
  • VPC peering & private endpoints
  • SOC 2 Type II & HIPAA
  • Custom SLA (up to 99.99%)
  • On-premise deployment option
  • Custom model training
  • Dedicated solutions engineer
Contact sales

Per-token pricing by model

Prices shown per 1 million tokens. Output tokens are typically 3-4x the cost of input tokens.

Model Input (per 1M tokens) Output (per 1M tokens) Context Window Throughput
Llama 4 Maverick 400B $0.80 $2.40 1M tokens 120 tok/s
Llama 4 Scout 109B $0.20 $0.60 512K tokens 340 tok/s
DeepSeek V3 685B $1.00 $3.00 128K tokens 95 tok/s
Mixtral 8x22B v0.3 $0.12 $0.36 64K tokens 480 tok/s
Qwen 3 72B Instruct $0.15 $0.45 128K tokens 410 tok/s
Gemma 3 27B $0.07 $0.21 32K tokens 620 tok/s
Llama 4 Scout 17B $0.04 $0.12 128K tokens 890 tok/s
Stable Diffusion 4 $0.003 per image (1024x1024) 1.2s/image
FLUX.2 Pro $0.005 per image (1024x1024) 0.8s/image
BGE-M3 (Embeddings) $0.01 8K tokens 10K tok/s

Dedicated GPU pricing

For workloads that require guaranteed capacity and isolation. Billed hourly with monthly commitment discounts.

Configuration GPUs Memory On-Demand ($/hr) Reserved 1yr ($/hr)
Small8x H100640 GB$24.00$16.80 (30% off)
Medium32x H1002.5 TB$92.00$64.40 (30% off)
Large64x H1005 TB$180.00$126.00 (30% off)
XL128x H10010 TB$350.00$245.00 (30% off)
H200 Small8x H2001.1 TB$36.00$25.20 (30% off)
H200 Large64x H2009 TB$272.00$190.40 (30% off)

All dedicated clusters include: managed infrastructure, automatic failover, 24/7 monitoring, and dedicated support. Custom configurations available for Enterprise customers.

Feature comparison

See exactly what's included in each plan.

Feature Free Pro Team Enterprise
Models available50+200+200+200+ & custom
Rate limit10 req/s1,000 req/s5,000 req/sUnlimited
Daily request limit1,000UnlimitedUnlimitedUnlimited
Fine-tuning✓ (priority)
Batch processing✓ (50% off)✓ (50% off)✓ (custom)
AI AgentsBasic✓ (advanced)
Function calling
Structured output
Team management
SSO / SAML
VPC peering
Dedicated clusters
SLA99.5%99.9%99.99%
SupportCommunityEmail (24h)Priority (4h)24/7 dedicated
SOC 2 report
HIPAA BAA

Frequently asked questions

How does pay-per-token pricing work?
You're charged based on the number of tokens processed. Input tokens (your prompts) and output tokens (model responses) are priced separately. 1 token ≈ 4 characters in English. You only pay for what you use — no monthly minimums or commitments.
What's included in the free tier?
The free tier includes 1,000 requests per day across 50+ models, with a rate limit of 10 requests per second. It's perfect for prototyping and experimentation. No credit card required to sign up.
Can I switch plans at any time?
Yes, you can upgrade or downgrade at any time. When upgrading, you get immediate access to new features. When downgrading, changes take effect at the end of your current billing cycle. There are no cancellation fees.
How does batch processing pricing work?
Batch processing offers a 50% discount on per-token pricing in exchange for asynchronous processing. Jobs are completed within your specified time window (typically 1-24 hours). This is ideal for non-time-sensitive workloads like data labeling or content generation.
What payment methods do you accept?
We accept all major credit cards (Visa, Mastercard, American Express), ACH bank transfers, and wire transfers for Enterprise customers. We also support billing through AWS Marketplace and GCP Marketplace.
Do you offer volume discounts?
Yes! The Team plan includes a 25% volume discount automatically. For Enterprise customers, we offer custom pricing based on committed usage. Contact our sales team for a custom quote tailored to your needs.
Is there a spending limit or budget cap?
Yes, you can set monthly spending limits in your dashboard. Once reached, requests will be rejected (not throttled) to prevent unexpected charges. You'll receive alerts at 50%, 80%, and 100% of your budget.
What's the difference between dedicated clusters and serverless?
Serverless inference shares GPU resources across customers and charges per-token. Dedicated clusters provide isolated GPU hardware exclusively for your workloads with guaranteed capacity, lower latency, and custom configurations. Dedicated clusters are billed hourly based on GPU count.

Start building for free

Get 1,000 free requests per day. No credit card required. Upgrade when you're ready.