Run AI with an API

Deploy machine learning models at scale. Pay per second of compute. No infrastructure to manage.

✓ No credit card required ✓ Free tier included ✓ Scale to zero
import infergrove

# Run a model with one line
model = infergrove.models.get("stability/sdxl")
output = model.predict(
    prompt="a serene mountain lake at sunset",
    width=1024,
    height=1024
)

# output → "https://inference.infergrove.com/out/abc123.png"

Explore models

Thousands of open-source models ready to run. Browse by category or search for what you need.

Image Generation Speech & Audio Video Large Language Models Image-to-Image Upscaling 3D Music
🎨

stability/sdxl

Stability AI

▶ 48.2M runs Image Gen

black-forest/flux-pro

Black Forest Labs

▶ 31.7M runs Image Gen
🎙️

openai/whisper

OpenAI

▶ 22.1M runs Speech
🧠

meta/llama-3.1-70b

Meta AI

▶ 18.9M runs LLM
🎬

tencent/hunyuan-video

Tencent

▶ 12.4M runs Video
🔍

nightmareai/real-esrgan

NightmareAI

▶ 9.8M runs Upscaling
🖼️

jagilley/controlnet

jagilley

▶ 7.2M runs Image-to-Image
🎵

meta/musicgen

Meta AI

▶ 5.1M runs Audio
Browse all models →

Works with your stack

Use our Python SDK, Node.js client, or call the HTTP API directly from any language.

Python Node.js HTTP cURL
# Python SDK
import infergrove

# Initialize the client
client = infergrove.Client(api_token="r8_your_token")

# Run a prediction
output = client.run(
    "stability/sdxl:latest",
    input={
        "prompt": "a cyberpunk cityscape at night, neon lights",
        "negative_prompt": "blurry, low quality",
        "width": 1024,
        "height": 1024,
        "num_inference_steps": 30,
        "guidance_scale": 7.5
    }
)

# Stream results as they arrive
for event in output:
    print(event)
# → "https://inference.infergrove.com/out/img_001.png"

Trusted by teams building the future

Vercel Figma Notion Stripe Shopify Spotify Adobe Linear

Join 200,000+ developers running AI models on InferGrove

50M+
Predictions per day
12K+
Models available
200K+
Active developers
99.9%
API uptime

How it works

Three ways to run AI on InferGrove. From quick experiments to production deployments.

1

Run models

Run open-source models with a single API call. No setup, no GPUs to manage. Choose from thousands of community models.

import infergrove

output = infergrove.run(
  "stability/sdxl",
  input={
    "prompt": "an astronaut riding a horse",
    "num_outputs": 4,
    "guidance_scale": 7.5,
    "num_inference_steps": 30
  }
)

# Returns list of image URLs
for url in output:
    print(url)
2

Fine-tune models

Train models on your own data. Create custom versions optimized for your specific use case and brand.

training = infergrove.trainings.create(
  model="stability/sdxl",
  input={
    "input_images": "https://my-data.zip",
    "token_string": "TOK",
    "max_train_steps": 1000,
    "learning_rate": 1e-6
  },
  destination="yourname/custom-sdxl"
)

# Monitor training progress
training.reload()
print(training.status)  # "processing"
3

Deploy custom models

Package any model with Cog and deploy it to InferGrove's auto-scaling infrastructure.

# Define your model
from cog import BasePredictor, Input

class Predictor(BasePredictor):
    def setup(self):
        self.model = load_model()

    def predict(self,
        image: Path = Input(desc="Input"),
    ) -> Path:
        return self.model(image)

# Deploy: $ infergrove push my-model

Streaming & webhooks built in

Get real-time updates as predictions run. Perfect for LLMs, video generation, and long-running tasks.

# Stream tokens from an LLM
for token in infergrove.stream(
    "meta/llama-3.1-70b",
    input={"prompt": "Explain quantum computing"}
):
    print(token, end="")

# Or use webhooks for async
prediction = infergrove.predictions.create(
    model="stability/sdxl",
    input={"prompt": "..."},
    webhook="https://your-app.com/webhook"
)

Developer-first experience

Built by developers, for developers. Every API decision optimized for simplicity and power.

🐍

Python SDK

Type-safe client with async support, streaming, and automatic retries.

pip install infergrove
📦

Node.js Client

Full TypeScript support with ESM and CommonJS compatibility.

npm install infergrove
🌐

REST API

Simple HTTP endpoints. Works from any language or platform.

api.infergrove.com/v1

Up and running in 60 seconds

From zero to your first AI prediction in three simple steps.

1

Install the SDK

$ pip install infergrove
2

Set your API token

$ export INFERGROVE_API_TOKEN=r8_your_token_here
3

Run your first prediction

import infergrove
output = infergrove.run("stability/sdxl", input={"prompt": "hello world"})
print(output)
Read the full quickstart guide →

Pay per second of compute

No upfront costs. No minimum commitments. Scale to zero when you're not running predictions.

GPU VRAM Price / second Price / hour Best for
NVIDIA T4 16 GB $0.000225 $0.81 Lightweight inference, testing
NVIDIA L4 24 GB $0.000350 $1.26 Efficient inference, small models
NVIDIA L40S 48 GB $0.000725 $2.61 Image generation, medium models
NVIDIA A40 48 GB $0.000575 $2.07 Balanced performance, training
NVIDIA A100 80 GB $0.001150 $4.14 Large models, fine-tuning
NVIDIA H100 80 GB $0.001850 $6.66 LLMs, video generation

Free tier

Get started with limited free predictions every day. No credit card required.

Volume discounts

Spend over $1,000/month? Contact us for custom pricing and committed use discounts.

Enterprise

Dedicated clusters, custom SLAs, and priority support for large-scale deployments.

View full pricing details

Automatic scaling

Your models scale up automatically when traffic spikes and scale back to zero when idle. You only pay for what you use.

  • Scale to zero — no idle costs
  • Auto-scale to hundreds of GPUs
  • Cold start optimization under 500ms
  • Global edge routing for low latency
  • Intelligent request queuing
  • Configurable min/max replicas
Learn more about scaling

Deploy any model in 3 steps

From open-source models to custom fine-tunes, go from code to production in minutes.

1
Choose a Model
Browse 500+ open-source models or upload your own custom model.
model = "stability/sdxl"
2
Run Inference
One API call. Auto-scaling GPUs handle the rest.
output = model.predict(input)
3
Scale & Ship
Auto-scale from 0 to thousands of requests. Pay per second.
deploy(model, scale="auto")

What developers are saying

Teams of all sizes trust InferGrove to power their AI features.

"InferGrove cut our inference costs by 60% and eliminated the need for a dedicated ML ops team. We went from managing GPU clusters to a single API call."

SC
Sarah Chen
CTO, PixelForge

"The auto-scaling is incredible. We handle 10x traffic spikes during product launches without any manual intervention. It just works."

MR
Marcus Rivera
Lead Engineer, Artisan AI

"We fine-tuned SDXL on our brand assets in 20 minutes. Now our design team generates on-brand visuals instantly. Game changer."

AP
Aisha Patel
VP Design, CreativeStack

Imagine what you can build

From creative tools to production pipelines, developers are building incredible things with InferGrove.

AI Art Generation

Create stunning visuals from text prompts

Video Editing AI

Automate video production workflows

Voice Synthesis

Generate natural speech in any language

Document Analysis

Extract insights from any document type

3D Asset Generation

Create 3D models from text or images

Music Composition

Generate original music and soundscapes

Face Restoration

Enhance and restore old or damaged photos

Code Generation

Build AI-powered developer tools

Style Transfer

Apply artistic styles to any image or video

Built for every use case

Whether you're building a startup or scaling enterprise AI, InferGrove adapts to your needs.

🎨

Creative tools

Build image editors, design assistants, and content creation platforms powered by state-of-the-art generative models.

🏥

Healthcare AI

Deploy medical imaging models, clinical NLP, and diagnostic assistants with HIPAA-compliant infrastructure.

🛒

E-commerce

Product image generation, virtual try-on, personalized recommendations, and automated product descriptions.

🎮

Gaming & entertainment

Generate game assets, NPC dialogue, procedural content, and real-time voice synthesis for immersive experiences.

📱

Mobile apps

Add AI features to iOS and Android apps without bundling large models. Low-latency API calls from anywhere.

🔬

Research & science

Run experiments at scale, iterate on model architectures, and share reproducible results with the community.

Built for enterprise

Deploy AI at scale with enterprise-grade security, compliance, and support.

  • SOC 2 Type II certified
  • Private model deployments
  • Dedicated GPU clusters
  • 99.9% uptime SLA
  • Priority support with dedicated account manager
  • Custom rate limits and quotas
  • VPC peering and private networking
  • SSO/SAML authentication
Contact sales
Enterprise Dashboard ● All systems operational
99.99%
Uptime SLA
12ms
Avg Latency
🔒 Security & Compliance
SOC 2 HIPAA GDPR SSO/SAML
Open Source Stats
12.4k
GitHub Stars
3.2k
Contributors
$ pip install cog
✓ Successfully installed cog-0.9.4
$ cog predict -i prompt="a cat"
Running prediction...
✓ Output saved to output.png

Open source at the core

InferGrove is built on open-source principles. Our model packaging tool, Cog, is fully open source and used by thousands of developers worldwide.

Package your model as a standard Docker container with a simple configuration file. No vendor lock-in — your models run anywhere.

# cog.yaml
build:
  python_version: "3.11"
  python_packages:
    - "torch==2.1.0"
    - "transformers==4.36.0"

predict: "predict.py:Predictor"

Ready to run AI?

Get started in minutes. No credit card required for the free tier.

Start building for free Read the docs