Can I use the OpenAI SDK?

Yes. Set the base_url to https://api.runcrate.ai/v1 and use your Runcrate API key. The chat completions endpoint is fully OpenAI-compatible.

Is streaming supported?

Yes. Pass stream: true to receive server-sent events. Compatible with the OpenAI SDK's streaming helpers.

How do I choose between models?

Start with DeepSeek V3 for general-purpose tasks or Llama 4 Scout for a good balance of quality and speed. Use the model catalog in your dashboard to compare pricing and capabilities.

runcrate

Contact Sales Console

CHAT COMPLETIONS API

One API, every chat model.

Access chat models from every major open-source provider through a single endpoint. DeepSeek, Llama, Qwen, Gemma, Mistral, and more. Standard OpenAI chat completions format with streaming, function calling, and JSON mode. Switch models with a single parameter change.

200+

Models

10+

Providers

/v1/chat/completions

Endpoint

Get API Key View Pricing

QUICK START

Integrate in minutes.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

# Switch models by changing one parameter
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3",  # or any of 200+ models
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What are the best practices for API design?"}
    ],
    stream=True,
)
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

AVAILABLE MODELS

Models you can use today.

Model	Provider	Price	Detail
deepseek-ai/DeepSeek-V3	DeepSeek	Per-token	128K context, MoE
meta-llama/Llama-4-Scout-17B-16E-Instruct	Meta	Per-token	17B MoE, 128K context
Qwen/Qwen3-32B	Alibaba	Per-token	32B, strong multilingual
google/gemma-3-27b-it	Google	Per-token	27B, instruction-tuned
mistralai/Mistral-Small-24B-Instruct-2501	Mistral	Per-token	24B, fast inference

deepseek-ai/DeepSeek-V3

DeepSeekPer-token

128K context, MoE

meta-llama/Llama-4-Scout-17B-16E-Instruct

MetaPer-token

17B MoE, 128K context

Qwen/Qwen3-32B

AlibabaPer-token

32B, strong multilingual

google/gemma-3-27b-it

GooglePer-token

27B, instruction-tuned

mistralai/Mistral-Small-24B-Instruct-2501

MistralPer-token

24B, fast inference

WHY RUNCRATE

Built for production.

Streaming Support

Server-sent events for real-time token streaming. Build responsive chat interfaces with time-to-first-token under 200ms.

Function Calling

Standard OpenAI tools format for structured function calls. Build AI agents that interact with your APIs and databases.

JSON Mode

Force structured JSON output from any model. Reliable for data extraction, form filling, and API integrations.

Multi-Provider

One API key, one billing account, every provider. Compare models side by side and switch without re-integrating.

FAQ

Common questions.

Start building with the chat API.

Get API Key View Pricing