Everything you need to integrate InferGrove into your applications. From quickstart to advanced deployment patterns.
Get your first API call working in under 5 minutes. Install the SDK, get an API key, and run inference.
Read guide →Complete reference for all API endpoints including chat completions, embeddings, images, and batch processing.
View reference →Step-by-step tutorials for common use cases: chatbots, RAG, agents, content generation, and more.
Browse examples →Switch from OpenAI to InferGrove in minutes. Our API is 100% compatible — just change the base URL and API key.
Read migration guide →Let models call your functions. Define tools with JSON schemas and build powerful agentic workflows.
Learn function calling →Get guaranteed JSON output matching your schema. No more parsing errors or retry loops.
Learn structured output →Combine embeddings with chat completions to build retrieval-augmented generation pipelines.
Build RAG system →Customize any model on your data. Prepare datasets, configure training, and deploy custom models.
Start fine-tuning →Build autonomous agents with tool calling, memory, and multi-step reasoning capabilities.
Build agents →InferGrove provides a fast, reliable API for running AI model inference. Our platform is compatible with OpenAI's API format, making migration seamless for existing applications.
All API requests should be made to:
https://api.infergrove.com/v1
Authenticate your requests using an API key passed in the Authorization header:
curl https://api.infergrove.com/v1/chat/completions \ -H "Authorization: Bearer ig-your-api-key" \ -H "Content-Type: application/json" \ -d '{"model": "meta-llama/Llama-4-Scout-109B", "messages": [{"role": "user", "content": "Hello!"}]}'
We provide official SDKs for Python, TypeScript, Go, Rust, and Java:
# Python pip install infergrove # TypeScript / Node.js npm install @infergrove/sdk # Go go get github.com/infergrove/infergrove-go # Rust cargo add infergrove
Here's a complete example of making a chat completion request:
from infergrove import InferGrove client = InferGrove(api_key="ig-your-api-key") response = client.chat.completions.create( model="meta-llama/Llama-4-Scout-109B", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is quantum computing?"} ], max_tokens=512, temperature=0.7 ) print(response.choices[0].message.content)
For real-time applications, use streaming to receive tokens as they're generated:
stream = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-109B",
messages=[{"role": "user", "content": "Write a poem"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
InferGrove uses standard HTTP status codes. Here are the most common errors:
400 Bad Request — Invalid parameters 401 Unauthorized — Invalid or missing API key 403 Forbidden — Insufficient permissions 404 Not Found — Model or resource not found 429 Rate Limited — Too many requests 500 Server Error — Internal error (retry) 503 Unavailable — Service temporarily down
Rate limits vary by plan. When you hit a rate limit, the API returns a 429 status code with a Retry-After header indicating when you can retry.
from infergrove import InferGrove, RateLimitError import time client = InferGrove( api_key="ig-...", max_retries=3, # Auto-retry on rate limits timeout=30.0 ) try: response = client.chat.completions.create( model="meta-llama/Llama-4-Scout-109B", messages=[{"role": "user", "content": "Hi"}] ) except RateLimitError as e: print(f"Rate limited. Retry after: {e.retry_after}s") time.sleep(e.retry_after)