Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Alibaba’s Qwen family covers chat, code generation, vision understanding, and text-to-speech — all available through the Runcrate API with a single API key.

Available Qwen models

ModelCategoryContextStrengths
Qwen3-MaxChat128KFlagship reasoning and instruction following
Qwen3.5-397B-A17BChat128KMoE architecture, high throughput
Qwen3-Coder-480B-A35B-Instruct-TurboCode256KCode generation, debugging, refactoring
Qwen3-VL-235B-A22B-InstructVision128KImage understanding, OCR, diagram analysis
Qwen3-TTSTTSNatural-sounding speech synthesis

Chat — Qwen3-Max

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

response = client.models.chat_completion(
    model="Qwen/Qwen3-Max",
    messages=[
        {"role": "system", "content": "You are a helpful research assistant."},
        {"role": "user", "content": "Compare microservices vs monolith for a team of 5 engineers."},
    ],
    max_tokens=1024,
)

print(response.choices[0].message.content)

Code — Qwen3-Coder

Purpose-built for code generation with 256K context:
from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

response = client.models.chat_completion(
    model="Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo",
    messages=[
        {"role": "system", "content": "Review code for bugs, style issues, and performance."},
        {"role": "user", "content": "Review this:\n\n```python\ndef process(data):\n    result = []\n    for i in range(len(data)):\n        if data[i] != None:\n            result.append(data[i] * 2)\n    return result\n```"},
    ],
    max_tokens=1024,
)

print(response.choices[0].message.content)

Vision — Qwen3-VL

Analyze images, extract text, understand diagrams:
from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

response = client.models.chat_completion(
    model="Qwen/Qwen3-VL-235B-A22B-Instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What does this chart show? Summarize the key trends."},
            {"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}},
        ],
    }],
    max_tokens=512,
)

print(response.choices[0].message.content)

TTS — Qwen3-TTS

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

response = client.models.text_to_speech(
    model="Qwen/Qwen3-TTS",
    input="Welcome to Runcrate. Your GPU instances are ready.",
    voice="alloy",
)

with open("welcome.mp3", "wb") as f:
    f.write(response.content)

Choosing the right Qwen model

TaskModelWhy
General chat, reasoningQwen3-MaxBest overall quality
High-throughput chatQwen3.5-397B-A17BMoE — fast and cheap per token
Code generation, reviewQwen3-Coder-480B256K context, code-specialized
Image analysis, OCRQwen3-VL-235BVision-language understanding
Speech synthesisQwen3-TTSNatural TTS output

Tips

  • Qwen3-Max is the safe default for most chat tasks.
  • Qwen3.5 MoE activates only 17B params per token — use it when you need speed at scale.
  • Qwen3-Coder handles 256K context for cross-file refactoring.
  • Qwen3-VL supports multiple images in a single request.

Next steps