Chat Completions - Runcrate

The chat completions endpoint powers text generation across Chat, Code, Reasoning, and Vision models. It supports streaming, system prompts, multi-turn conversations, and image inputs for vision-capable models.

Endpoint

POST https://api.runcrate.ai/v1/chat/completions

Basic Usage

curl https://api.runcrate.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer rc_live_YOUR_API_KEY" \
  -d '{
    "model": "deepseek-ai/DeepSeek-V3",
    "messages": [
      {"role": "user", "content": "Explain quantum computing in simple terms"}
    ],
    "max_tokens": 512,
    "temperature": 0.7
  }'

Streaming

Enable real-time token streaming with stream: true:

stream = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True,
)
for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="")

Vision Models

Vision-capable models accept images in the message content. Send images as URLs or base64:

response = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is in this image?"},
                {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}},
            ],
        }
    ],
)

Vision-capable models include Gemini 2.5, Llama 4 Maverick, Gemma 3, GPT-4o, and others marked with “Vision” in the Model Catalog.

Reasoning Models

Reasoning models (DeepSeek-R1, QwQ, etc.) produce chain-of-thought output. The reasoning steps appear in the reasoning_content field of the streamed response delta, separate from the final answer in content.

Parameters

Parameter	Type	Default	Description
`model`	string	required	Model ID (e.g., `deepseek-ai/DeepSeek-V3`)
`messages`	array	required	Conversation messages with `role` and `content`
`max_tokens`	integer	varies	Maximum tokens to generate
`temperature`	number	0.7	Randomness (0 = deterministic, 2 = very random)
`stream`	boolean	false	Enable streaming responses
`top_p`	number	1.0	Nucleus sampling threshold

Message Roles

Role	Purpose
`system`	Sets the model’s behavior and personality
`user`	The user’s input
`assistant`	Previous model responses (for multi-turn)

Popular Chat Models

Model	Context	Best For
`deepseek-ai/DeepSeek-V3`	128K	General purpose, cost-effective
`anthropic/claude-4-sonnet`	200K	Reasoning, analysis, coding
`google/gemini-2.5-flash`	1M	Fast, multimodal, long context
`meta-llama/Llama-4-Scout`	128K	Multilingual, efficient
`Qwen/Qwen3-Max`	128K	Reasoning, multilingual

​Endpoint

​Basic Usage

​Streaming

​Vision Models

​Reasoning Models

​Parameters

​Message Roles

​Popular Chat Models