Qwen Models Guide — Chat, Code, Vision, TTS

Alibaba’s Qwen family covers chat, code generation, vision understanding, and text-to-speech — all available through the Runcrate API with a single API key.

Available Qwen models

Model	Category	Context	Strengths
Qwen3-Max	Chat	128K	Flagship reasoning and instruction following
Qwen3.5-397B-A17B	Chat	128K	MoE architecture, high throughput
Qwen3-Coder-480B-A35B-Instruct-Turbo	Code	256K	Code generation, debugging, refactoring
Qwen3-VL-235B-A22B-Instruct	Vision	128K	Image understanding, OCR, diagram analysis
Qwen3-TTS	TTS	—	Natural-sounding speech synthesis

Chat — Qwen3-Max

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

response = client.models.chat_completion(
    model="Qwen/Qwen3-Max",
    messages=[
        {"role": "system", "content": "You are a helpful research assistant."},
        {"role": "user", "content": "Compare microservices vs monolith for a team of 5 engineers."},
    ],
    max_tokens=1024,
)

print(response.choices[0].message.content)

Code — Qwen3-Coder

Purpose-built for code generation with 256K context:

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

response = client.models.chat_completion(
    model="Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo",
    messages=[
        {"role": "system", "content": "Review code for bugs, style issues, and performance."},
        {"role": "user", "content": "Review this:\n\n```python\ndef process(data):\n    result = []\n    for i in range(len(data)):\n        if data[i] != None:\n            result.append(data[i] * 2)\n    return result\n```"},
    ],
    max_tokens=1024,
)

print(response.choices[0].message.content)

Vision — Qwen3-VL

Analyze images, extract text, understand diagrams:

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

response = client.models.chat_completion(
    model="Qwen/Qwen3-VL-235B-A22B-Instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What does this chart show? Summarize the key trends."},
            {"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}},
        ],
    }],
    max_tokens=512,
)

print(response.choices[0].message.content)

TTS — Qwen3-TTS

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

response = client.models.text_to_speech(
    model="Qwen/Qwen3-TTS",
    input="Welcome to Runcrate. Your GPU instances are ready.",
    voice="alloy",
)

with open("welcome.mp3", "wb") as f:
    f.write(response.content)

Choosing the right Qwen model

Task	Model	Why
General chat, reasoning	Qwen3-Max	Best overall quality
High-throughput chat	Qwen3.5-397B-A17B	MoE — fast and cheap per token
Code generation, review	Qwen3-Coder-480B	256K context, code-specialized
Image analysis, OCR	Qwen3-VL-235B	Vision-language understanding
Speech synthesis	Qwen3-TTS	Natural TTS output

Tips

Qwen3-Max is the safe default for most chat tasks.
Qwen3.5 MoE activates only 17B params per token — use it when you need speed at scale.
Qwen3-Coder handles 256K context for cross-file refactoring.
Qwen3-VL supports multiple images in a single request.

​Available Qwen models

​Chat — Qwen3-Max

​Code — Qwen3-Coder

​Vision — Qwen3-VL

​TTS — Qwen3-TTS

​Choosing the right Qwen model

​Tips

​Next steps