200+ optimized open-source models ready to deploy. Every model runs on custom CUDA kernels for maximum throughput.
Meta's flagship 400B parameter model with mixture-of-experts architecture. Best-in-class reasoning and instruction following.
Excellent balance of quality and speed. 109B parameters with 10M token context window. Ideal for production workloads.
DeepSeek's largest model with 685B total parameters. Exceptional at math, coding, and complex reasoning tasks.
Mistral's mixture-of-experts model. 8 experts with 22B parameters each. Excellent throughput for general tasks.
Alibaba's 72B model with excellent multilingual capabilities. Strong performance across 30+ languages.
Google's efficient 27B model. Punches above its weight class with excellent instruction following and reasoning.
State-of-the-art code generation. Supports 100+ programming languages with fill-in-the-middle capability.
BigCode's latest model trained on The Stack v3. Excellent for code completion and generation tasks.
Meta's code-specialized Llama variant. Excellent at code review, debugging, and complex refactoring tasks.
Stability AI's latest diffusion model. Photorealistic image generation with excellent prompt adherence.
Black Forest Labs' flagship model. Exceptional text rendering and compositional understanding.
Real-time image generation in a single step. Perfect for interactive applications and rapid prototyping.
OpenAI's latest speech recognition model. Supports 100+ languages with near-human accuracy.
High-quality text-to-speech with emotion control. Generate natural-sounding speech in multiple voices.
Generate music from text descriptions. Create background music, jingles, and soundscapes programmatically.
Multi-lingual, multi-granularity embedding model. Supports dense, sparse, and multi-vector retrieval.
Instruction-tuned embedding model based on Mistral 7B. Excellent for semantic search and RAG applications.
Lightweight, high-quality embeddings with Matryoshka representation learning. Flexible dimensionality.
Every model on InferGrove is optimized with custom CUDA kernels, quantization, and speculative decoding for best-in-class performance.
Benchmarks measured on InferGrove's production infrastructure with standard prompts. Actual performance may vary based on prompt length and concurrency.
We add new models every week. Request a model or bring your own custom model to deploy on our infrastructure.