Chat Completions
The chat completions endpoint powers text generation across Chat, Code, Reasoning, and Vision models. It supports streaming, system prompts, multi-turn conversations, and image inputs for vision-capable models.Endpoint
Basic Usage
Streaming
Enable real-time token streaming withstream: true:
Vision Models
Vision-capable models accept images in the message content. Send images as URLs or base64:Reasoning Models
Reasoning models (DeepSeek-R1, QwQ, etc.) produce chain-of-thought output. The reasoning steps appear in thereasoning_content field of the streamed response delta, separate from the final answer in content.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | required | Model ID (e.g., deepseek-ai/DeepSeek-V3) |
messages | array | required | Conversation messages with role and content |
max_tokens | integer | varies | Maximum tokens to generate |
temperature | number | 0.7 | Randomness (0 = deterministic, 2 = very random) |
stream | boolean | false | Enable streaming responses |
top_p | number | 1.0 | Nucleus sampling threshold |
Message Roles
| Role | Purpose |
|---|---|
system | Sets the model’s behavior and personality |
user | The user’s input |
assistant | Previous model responses (for multi-turn) |
Popular Chat Models
| Model | Context | Best For |
|---|---|---|
deepseek-ai/DeepSeek-V3 | 128K | General purpose, cost-effective |
anthropic/claude-4-sonnet | 200K | Reasoning, analysis, coding |
google/gemini-2.5-flash | 1M | Fast, multimodal, long context |
meta-llama/Llama-4-Scout | 128K | Multilingual, efficient |
Qwen/Qwen3-Max | 128K | Reasoning, multilingual |