OLLAMA IN THE CLOUD

Your Ollama models, cloud-powered.

Love Ollama but limited by your local GPU? Run the same models in the cloud through Runcrate's API. Llama, Qwen, DeepSeek, Gemma, Mistral, and more, all accessible via an OpenAI-compatible endpoint. No VRAM constraints, no thermal throttling, no model downloads. Just swap the endpoint.

200+
Models
OpenAI-compatible
API format
< 60s
Setup time

QUICK START

Integrate in minutes.

from openai import OpenAI

# Same code you'd use with Ollama's OpenAI-compatible mode
client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[
        {"role": "user", "content": "Summarize the benefits of cloud inference."}
    ],
)
print(response.choices[0].message.content)

AVAILABLE MODELS

Models you can use today.

meta-llama/Llama-4-Scout-17B-16E-Instruct
MetaPer-token
17B MoE, 128K context
Qwen/Qwen3-32B
AlibabaPer-token
32B, strong multilingual
deepseek-ai/DeepSeek-V3
DeepSeekPer-token
128K context, MoE
google/gemma-3-27b-it
GooglePer-token
27B, instruction-tuned
mistralai/Mistral-Small-24B-Instruct-2501
MistralPer-token
24B, fast inference

WHY RUNCRATE

Built for production.

No Local GPU Needed

Stop waiting for model downloads and fighting with CUDA versions. Your models run on H100s in the cloud, accessible from anywhere.

Same Models, Bigger Scale

The models you know from Ollama, running on enterprise-grade hardware. No VRAM limits, no thermal throttling, no 4-bit quantization compromises.

OpenAI-Compatible API

If your code works with Ollama's OpenAI compatibility mode, it works with Runcrate. Change the base URL and API key, done.

Pay Per Token

No idle GPU costs. You pay for the tokens you generate, not for a machine sitting warm. Ideal for bursty or unpredictable workloads.

COMPARISON

Runcrate vs Local Ollama.

GPU hardware
Runcrate: H100 / H200 cloud
Local Ollama: Your local GPU
Model size limit
Runcrate: No VRAM limit
Local Ollama: Limited by your GPU
Setup
Runcrate: API key + one line
Local Ollama: Download + install
Cost model
Runcrate: Per-token, no idle cost
Local Ollama: Electricity + GPU purchase
Available models
Runcrate: 200+ via API
Local Ollama: Community-maintained

FAQ

Common questions.

Run Ollama models in the cloud.