OLLAMA IN THE CLOUD
Love Ollama but limited by your local GPU? Run the same models in the cloud through Runcrate's API. Llama, Qwen, DeepSeek, Gemma, Mistral, and more, all accessible via an OpenAI-compatible endpoint. No VRAM constraints, no thermal throttling, no model downloads. Just swap the endpoint.

QUICK START
from openai import OpenAI
# Same code you'd use with Ollama's OpenAI-compatible mode
client = OpenAI(
base_url="https://api.runcrate.ai/v1",
api_key="rc_live_YOUR_API_KEY",
)
response = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[
{"role": "user", "content": "Summarize the benefits of cloud inference."}
],
)
print(response.choices[0].message.content)AVAILABLE MODELS
| Model | Provider | Price | Detail |
|---|---|---|---|
| meta-llama/Llama-4-Scout-17B-16E-Instruct | Meta | Per-token | 17B MoE, 128K context |
| Qwen/Qwen3-32B | Alibaba | Per-token | 32B, strong multilingual |
| deepseek-ai/DeepSeek-V3 | DeepSeek | Per-token | 128K context, MoE |
| google/gemma-3-27b-it | Per-token | 27B, instruction-tuned | |
| mistralai/Mistral-Small-24B-Instruct-2501 | Mistral | Per-token | 24B, fast inference |
WHY RUNCRATE
Stop waiting for model downloads and fighting with CUDA versions. Your models run on H100s in the cloud, accessible from anywhere.
The models you know from Ollama, running on enterprise-grade hardware. No VRAM limits, no thermal throttling, no 4-bit quantization compromises.
If your code works with Ollama's OpenAI compatibility mode, it works with Runcrate. Change the base URL and API key, done.
No idle GPU costs. You pay for the tokens you generate, not for a machine sitting warm. Ideal for bursty or unpredictable workloads.
COMPARISON
| Feature | Runcrate | Local Ollama |
|---|---|---|
| GPU hardware | H100 / H200 cloud | Your local GPU |
| Model size limit | No VRAM limit | Limited by your GPU |
| Setup | API key + one line | Download + install |
| Cost model | Per-token, no idle cost | Electricity + GPU purchase |
| Available models | 200+ via API | Community-maintained |
FAQ