LLAMA API

Llama 4 and 3, hosted and ready.

Run Meta's latest open-weight models without managing infrastructure. Llama 4 Scout brings Mixture-of-Experts efficiency with 128K context. Llama 3.3 and 3.1 remain strong choices for general chat and code. All served via an OpenAI-compatible endpoint.

Llama 4 Scout
Latest model
Up to 128K
Context
Open-weight
License

QUICK START

Integrate in minutes.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[
        {"role": "user", "content": "Explain how MoE models work."}
    ],
)
print(response.choices[0].message.content)

AVAILABLE MODELS

Models you can use today.

meta-llama/Llama-4-Scout-17B-16E-Instruct
MetaPer-token
17B MoE, 128K context, newest
meta-llama/Llama-3.3-70B-Instruct
MetaPer-token
70B dense, strong reasoning
meta-llama/Llama-3.1-8B-Instruct
MetaPer-token
8B, lightweight, fast

WHY RUNCRATE

Built for production.

Llama 4 MoE

Llama 4 Scout uses 17B active parameters from a 16-expert mixture, delivering strong quality with efficient inference.

Open Weights

All Llama models are open-weight. Use the API for convenience, or download weights to self-host on Runcrate GPU instances for full control.

Full Model Range

From the lightweight 8B model for fast prototyping to the 70B powerhouse for maximum quality. Pick the right size for your latency and cost budget.

128K Context

Llama 4 Scout supports 128K tokens of context, enough for entire codebases, lengthy documents, or extended multi-turn conversations.

FAQ

Common questions.

Start building with Llama.