Runcrate Pricing
Public rate card on every open-source model. Per-second GPU compute, no commit. Dedicated capacity when the rate card starts to hurt. BYOC when compliance demands it.
PLANS
Get an API key. Ship today. Talk to nobody.
$0
per month · pay as you go
What's included
Deployment
When the rate card starts to hurt.
Custom
volume discounts
What's included
Deployment
Run it our way. Or run it in your VPC.
Custom
tailored contract
What's included
Deployment
INFERENCE · PAY PER TOKEN
No minimum. Stop calling the API, stop paying.
See all 170 modelsINFERENCE · DEDICATED
Reserve GPU capacity sized to your traffic, served on Arc — our serving stack that fits 2–3× more tokens per GPU than vanilla vLLM. Commit to a monthly minimum, get a discounted rate on every token. No more 429s on burst, no more TPM ceilings.
Past ~$10k/month, dedicated is cheaper and the bill stops surprising you. We run a 7-day pilot parallel to your current provider — verify cost and latency before flipping a switch.
What you get
COMPUTE · PER SECOND
# real-time GPU pricing
GET /api/gpu-pricing
Refreshed every 5 minutes. Hit it from anywhere.
21 GPU SKUs available. Volume + reserved discounts on Dedicated.
TALK TO AN ENGINEER
Self-serve covers most jobs. For dedicated inference, reserved compute, BYOC, or specialty SKUs — tell us your workload. We respond within 24 hours with a rate card and a pilot plan.
Prefer Slack?
Create a Slack Connect channel
FAQ