MISTRAL API

Mistral models, zero setup.

Run Mistral models through a single OpenAI-compatible endpoint. From lightweight instruction-following to powerful code generation, Mistral models deliver strong performance at competitive token costs. Same API format as OpenAI, same SDK, different model parameter.

OpenAI SDK
Compatibility
<100ms P50
Latency
Up to 128K
Context

QUICK START

Integrate in minutes.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="mistralai/Mistral-Small-24B-Instruct-2501",
    messages=[
        {"role": "user", "content": "Write a Python function to merge two sorted lists."}
    ],
)
print(response.choices[0].message.content)

AVAILABLE MODELS

Models you can use today.

mistralai/Mistral-Small-24B-Instruct-2501
MistralPer-token
24B params, strong instruction following
mistralai/Voxtral-Small
MistralPer-minute
Audio transcription, multilingual

WHY RUNCRATE

Built for production.

Efficient Architecture

Mistral models use sliding-window attention and grouped-query attention for fast inference at lower memory footprint, translating to lower per-token costs.

Strong Code Generation

Mistral excels at code generation, bug fixing, and code review tasks. Competitive with much larger models on coding benchmarks.

Multilingual

Solid performance across English, French, German, Spanish, Italian, and many more languages for global applications.

Instruction Following

Instruct-tuned variants follow complex instructions precisely, making them reliable for structured outputs, JSON generation, and tool use.

FAQ

Common questions.

Start building with Mistral.