LLAMA API
Run Meta's latest open-weight models without managing infrastructure. Llama 4 Scout brings Mixture-of-Experts efficiency with 128K context. Llama 3.3 and 3.1 remain strong choices for general chat and code. All served via an OpenAI-compatible endpoint.

QUICK START
from openai import OpenAI
client = OpenAI(
base_url="https://api.runcrate.ai/v1",
api_key="rc_live_YOUR_API_KEY",
)
response = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[
{"role": "user", "content": "Explain how MoE models work."}
],
)
print(response.choices[0].message.content)AVAILABLE MODELS
| Model | Provider | Price | Detail |
|---|---|---|---|
| meta-llama/Llama-4-Scout-17B-16E-Instruct | Meta | Per-token | 17B MoE, 128K context, newest |
| meta-llama/Llama-3.3-70B-Instruct | Meta | Per-token | 70B dense, strong reasoning |
| meta-llama/Llama-3.1-8B-Instruct | Meta | Per-token | 8B, lightweight, fast |
WHY RUNCRATE
Llama 4 Scout uses 17B active parameters from a 16-expert mixture, delivering strong quality with efficient inference.
All Llama models are open-weight. Use the API for convenience, or download weights to self-host on Runcrate GPU instances for full control.
From the lightweight 8B model for fast prototyping to the 70B powerhouse for maximum quality. Pick the right size for your latency and cost budget.
Llama 4 Scout supports 128K tokens of context, enough for entire codebases, lengthy documents, or extended multi-turn conversations.
FAQ