CHAT COMPLETIONS API
Access chat models from every major open-source provider through a single endpoint. DeepSeek, Llama, Qwen, Gemma, Mistral, and more. Standard OpenAI chat completions format with streaming, function calling, and JSON mode. Switch models with a single parameter change.
QUICK START
from openai import OpenAI
client = OpenAI(
base_url="https://api.runcrate.ai/v1",
api_key="rc_live_YOUR_API_KEY",
)
# Switch models by changing one parameter
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3", # or any of 200+ models
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What are the best practices for API design?"}
],
stream=True,
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")AVAILABLE MODELS
| Model | Provider | Price | Detail |
|---|---|---|---|
| deepseek-ai/DeepSeek-V3 | DeepSeek | Per-token | 128K context, MoE |
| meta-llama/Llama-4-Scout-17B-16E-Instruct | Meta | Per-token | 17B MoE, 128K context |
| Qwen/Qwen3-32B | Alibaba | Per-token | 32B, strong multilingual |
| google/gemma-3-27b-it | Per-token | 27B, instruction-tuned | |
| mistralai/Mistral-Small-24B-Instruct-2501 | Mistral | Per-token | 24B, fast inference |
WHY RUNCRATE
Server-sent events for real-time token streaming. Build responsive chat interfaces with time-to-first-token under 200ms.
Standard OpenAI tools format for structured function calls. Build AI agents that interact with your APIs and databases.
Force structured JSON output from any model. Reliable for data extraction, form filling, and API integrations.
One API key, one billing account, every provider. Compare models side by side and switch without re-integrating.
FAQ