What is the difference between Llama 4 and Llama 3?

Llama 4 Scout uses a Mixture-of-Experts architecture for better efficiency. It activates only 17B parameters per token from 16 experts. Llama 3 models are dense, with all parameters active on every token.

Can I use Llama for commercial projects?

Yes. Meta's Llama models are released under a permissive license that allows commercial use. Check Meta's license terms for specific conditions.

Is Llama 3.1 405B available?

Model availability depends on current infrastructure. Check the model catalog in your dashboard for the latest list of available Llama models.

runcrate

Contact Sales Console

LLAMA API

Llama 4 and 3, hosted and ready.

Run Meta's latest open-weight models without managing infrastructure. Llama 4 Scout brings Mixture-of-Experts efficiency with 128K context. Llama 3.3 and 3.1 remain strong choices for general chat and code. All served via an OpenAI-compatible endpoint.

Llama 4 Scout

Latest model

Up to 128K

Context

Open-weight

License

Get API Key View Pricing

QUICK START

Integrate in minutes.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[
        {"role": "user", "content": "Explain how MoE models work."}
    ],
)
print(response.choices[0].message.content)

AVAILABLE MODELS

Models you can use today.

Model	Provider	Price	Detail
meta-llama/Llama-4-Scout-17B-16E-Instruct	Meta	Per-token	17B MoE, 128K context, newest
meta-llama/Llama-3.3-70B-Instruct	Meta	Per-token	70B dense, strong reasoning
meta-llama/Llama-3.1-8B-Instruct	Meta	Per-token	8B, lightweight, fast

meta-llama/Llama-4-Scout-17B-16E-Instruct

MetaPer-token

17B MoE, 128K context, newest

meta-llama/Llama-3.3-70B-Instruct

MetaPer-token

70B dense, strong reasoning

meta-llama/Llama-3.1-8B-Instruct

MetaPer-token

8B, lightweight, fast

WHY RUNCRATE

Built for production.

Llama 4 MoE

Llama 4 Scout uses 17B active parameters from a 16-expert mixture, delivering strong quality with efficient inference.

Open Weights

All Llama models are open-weight. Use the API for convenience, or download weights to self-host on Runcrate GPU instances for full control.

Full Model Range

From the lightweight 8B model for fast prototyping to the 70B powerhouse for maximum quality. Pick the right size for your latency and cost budget.

128K Context

Llama 4 Scout supports 128K tokens of context, enough for entire codebases, lengthy documents, or extended multi-turn conversations.

FAQ

Common questions.

Start building with Llama.

Get API Key View Pricing