CHAT COMPLETIONS API

One API, every chat model.

Access chat models from every major open-source provider through a single endpoint. DeepSeek, Llama, Qwen, Gemma, Mistral, and more. Standard OpenAI chat completions format with streaming, function calling, and JSON mode. Switch models with a single parameter change.

200+
Models
10+
Providers
/v1/chat/completions
Endpoint

QUICK START

Integrate in minutes.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

# Switch models by changing one parameter
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3",  # or any of 200+ models
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What are the best practices for API design?"}
    ],
    stream=True,
)
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

AVAILABLE MODELS

Models you can use today.

deepseek-ai/DeepSeek-V3
DeepSeekPer-token
128K context, MoE
meta-llama/Llama-4-Scout-17B-16E-Instruct
MetaPer-token
17B MoE, 128K context
Qwen/Qwen3-32B
AlibabaPer-token
32B, strong multilingual
google/gemma-3-27b-it
GooglePer-token
27B, instruction-tuned
mistralai/Mistral-Small-24B-Instruct-2501
MistralPer-token
24B, fast inference

WHY RUNCRATE

Built for production.

Streaming Support

Server-sent events for real-time token streaming. Build responsive chat interfaces with time-to-first-token under 200ms.

Function Calling

Standard OpenAI tools format for structured function calls. Build AI agents that interact with your APIs and databases.

JSON Mode

Force structured JSON output from any model. Reliable for data extraction, form filling, and API integrations.

Multi-Provider

One API key, one billing account, every provider. Compare models side by side and switch without re-integrating.

FAQ

Common questions.

Start building with the chat API.