Route Between AI Models

Not every request needs the same model. Use a fast, cheap model for classification and routing, then send generation tasks to a stronger model. With Runcrate, every model shares the same API — switching is a string change.

The routing pattern

User request
    ↓
Classify intent (fast model — DeepSeek V3.2)
    ↓
┌─────────────────────────────────────────────┐
│  simple question → DeepSeek V3.2 ($0.30/M)  │
│  creative writing → Claude 4 Sonnet ($3/M)   │
│  code generation → Qwen3 Coder ($0.20/M)     │
│  unsafe content  → blocked                    │
└─────────────────────────────────────────────┘
    ↓
Response

Build a model router

from openai import OpenAI
import json

client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

# Step 1: Classify the request with a fast, cheap model
def classify_intent(user_message: str) -> str:
    response = client.chat.completions.create(
        model="deepseek-ai/DeepSeek-V3.2",
        messages=[
            {
                "role": "system",
                "content": 'Classify the user message into exactly one category: "simple_qa", "creative", "code", "unsafe". Return only the category string, no quotes, no explanation.',
            },
            {"role": "user", "content": user_message},
        ],
        max_tokens=16,
    )
    return response.choices[0].message.content.strip().lower()

# Step 2: Route to the right model
MODEL_MAP = {
    "simple_qa": "deepseek-ai/DeepSeek-V3.2",
    "creative": "anthropic/claude-4-sonnet",
    "code": "Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo",
}

def route_and_generate(user_message: str) -> str:
    intent = classify_intent(user_message)

    if intent == "unsafe":
        return "I can't help with that request."

    model = MODEL_MAP.get(intent, "deepseek-ai/DeepSeek-V3.2")

    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": user_message}],
        max_tokens=2048,
    )
    return response.choices[0].message.content


# Try it
queries = [
    "What is the capital of Japan?",
    "Write a short story about a robot learning to paint.",
    "Write a Python function to parse CSV files with error handling.",
]

for query in queries:
    intent = classify_intent(query)
    model = MODEL_MAP.get(intent, "deepseek-ai/DeepSeek-V3.2")
    print(f"Query: {query}")
    print(f"Intent: {intent} → Model: {model}")
    print(f"Response: {route_and_generate(query)[:100]}...")
    print()

Cost comparison

Strategy	Avg cost per request	Quality
Always use Claude 4 Sonnet	~$0.015	Highest
Always use DeepSeek V3.2	~$0.001	Good
Routed (this example)	~$0.003	Highest where it matters

Routing typically cuts costs 60-80% compared to always using the strongest model, with minimal quality loss — because most requests are simple Q&A that a fast model handles perfectly.

Next steps

AI content moderation — add Llama Guard as a safety check before routing.
Build an AI SaaS backend — full production backend with routing, billing, and rate limiting.
Model catalog — compare models by price, speed, and capability.

Inference Examples

Compute Examples

Storage Examples

Route Between AI Models

The routing pattern

Build a model router

Cost comparison

Next steps

Inference Examples

Compute Examples

Storage Examples

Documentation Index

​The routing pattern

​Build a model router

​Cost comparison

​Next steps

The routing pattern

Build a model router

Cost comparison

Next steps