Skip to main content

Chat Completions

The chat completions endpoint powers text generation across Chat, Code, Reasoning, and Vision models. It supports streaming, system prompts, multi-turn conversations, and image inputs for vision-capable models.

Endpoint

POST https://api.runcrate.ai/v1/chat/completions

Basic Usage

curl https://api.runcrate.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer rc_live_YOUR_API_KEY" \
  -d '{
    "model": "deepseek-ai/DeepSeek-V3",
    "messages": [
      {"role": "user", "content": "Explain quantum computing in simple terms"}
    ],
    "max_tokens": 512,
    "temperature": 0.7
  }'

Streaming

Enable real-time token streaming with stream: true:
stream = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True,
)
for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="")

Vision Models

Vision-capable models accept images in the message content. Send images as URLs or base64:
response = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is in this image?"},
                {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}},
            ],
        }
    ],
)
Vision-capable models include Gemini 2.5, Llama 4 Maverick, Gemma 3, GPT-4o, and others marked with “Vision” in the Model Catalog.

Reasoning Models

Reasoning models (DeepSeek-R1, QwQ, etc.) produce chain-of-thought output. The reasoning steps appear in the reasoning_content field of the streamed response delta, separate from the final answer in content.

Parameters

ParameterTypeDefaultDescription
modelstringrequiredModel ID (e.g., deepseek-ai/DeepSeek-V3)
messagesarrayrequiredConversation messages with role and content
max_tokensintegervariesMaximum tokens to generate
temperaturenumber0.7Randomness (0 = deterministic, 2 = very random)
streambooleanfalseEnable streaming responses
top_pnumber1.0Nucleus sampling threshold

Message Roles

RolePurpose
systemSets the model’s behavior and personality
userThe user’s input
assistantPrevious model responses (for multi-turn)
ModelContextBest For
deepseek-ai/DeepSeek-V3128KGeneral purpose, cost-effective
anthropic/claude-4-sonnet200KReasoning, analysis, coding
google/gemini-2.5-flash1MFast, multimodal, long context
meta-llama/Llama-4-Scout128KMultilingual, efficient
Qwen/Qwen3-Max128KReasoning, multilingual