Runcrate Pricing

Pricing built for the workload.
Not the quota.

Public rate card on every open-source model. Per-second GPU compute, no commit. Dedicated capacity when the rate card starts to hurt. BYOC when compliance demands it.

170+
Models
40–60%
Off aggregators
Per-sec
GPU billing
$0
To start

PLANS

Three plans. No surprises.

Self-Serve

Get an API key. Ship today. Talk to nobody.

$0

per month · pay as you go

What's included

  • Pay-per-token inference on every model
  • Per-second GPU compute
  • OpenAI-compatible endpoint
  • Public rate card, no negotiations
  • Multi-cloud autoscaling
  • Email + community Discord

Deployment

Runcrate Cloud
Get an API key

Dedicated

Most popular

When the rate card starts to hurt.

Custom

volume discounts

What's included

  • Everything in Self-Serve
  • 40–60% off the public rate card
  • Dedicated inference on Arc
  • Reserved GPU capacity, no scale-to-zero
  • Hands-on engineering setup
  • 7-day pilot, parallel to your current provider
  • 99.9% uptime SLA
  • Slack Connect to our engineers

Deployment

Runcrate Cloud
Talk to an engineer

Enterprise

Run it our way. Or run it in your VPC.

Custom

tailored contract

What's included

  • Everything in Dedicated
  • Custom contract pricing on inference
  • BYOC + self-hosted deployments
  • Region pinning — US / EU / APAC
  • HIPAA-eligible
  • SOC 2 Type II datacenter partners
  • 99.95% / 99.99% SLA with credits
  • Named CSM + on-call engineering

Deployment

Runcrate CloudBYOCHybrid
Talk to sales

INFERENCE · PAY PER TOKEN

The open-source frontier. One API. Public rate card.

Moonshot AI
Kimi K2.5
Moonshot AI
TEXT
$0.45 / $2.25/ 1M
262K
DeepSeek AI
DeepSeek V4 Flash
DeepSeek AI
TEXT
$0.14 / $0.28/ 1M
1M
Meta
Llama 4 Maverick 17B 128E Instruct FP8
Meta
TEXT
$0.15 / $0.60/ 1M
1M
Alibaba Cloud
Qwen3 Coder 480B A35B Instruct Turbo
Alibaba Cloud
TEXT
$0.30 / $1.00/ 1M
262K
Alibaba Cloud
Qwen3 235B A22B Thinking 2507
Alibaba Cloud
TEXT
$0.23 / $2.30/ 1M
262K
Zhipu AI
GLM 5
Zhipu AI
TEXT
$0.60 / $2.08/ 1M
203K
OpenAI
Gpt Oss 120b
OpenAI
TEXT
$0.039 / $0.19/ 1M
131K
MiniMax AI
MiniMax M2.5
MiniMax AI
TEXT
$0.15 / $1.15/ 1M
197K
NVIDIA
Llama 3.3 Nemotron Super 49B V1.5
NVIDIA
TEXT
$0.40 / $0.40/ 1M
131K
Mistral AI
Mixtral 8x7B Instruct V0.1
Mistral AI
TEXT
$0.54 / $0.54/ 1M
33K

No minimum. Stop calling the API, stop paying.

See all 170 models

INFERENCE · DEDICATED

When the rate card starts to hurt, reserve capacity.

Reserve GPU capacity sized to your traffic, served on Arc — our serving stack that fits 2–3× more tokens per GPU than vanilla vLLM. Commit to a monthly minimum, get a discounted rate on every token. No more 429s on burst, no more TPM ceilings.

Past ~$10k/month, dedicated is cheaper and the bill stops surprising you. We run a 7-day pilot parallel to your current provider — verify cost and latency before flipping a switch.

What you get

  • Dedicated capacity sized to your traffic
  • 40–60% off the public rate card
  • p99 latency floor written into the contract
  • Burst-tolerant — no 429s, no TPM ceilings
  • 99.9% uptime SLA
  • OpenAI-compatible — swap one base URL

COMPUTE · PER SECOND

GPUs by the second.
Stop the instance, stop the meter.

Live API · no key required

# real-time GPU pricing

GET /api/gpu-pricing

Refreshed every 5 minutes. Hit it from anywhere.

B300Blackwell
288GB
$8.48/hr
B200Blackwell
192GB
$7.17/hr
H200Hopper
141GB
$3.78/hr
H100Hopper
80GB
$2.09/hr
GH200Hopper
96GB
$2.52/hr
A100Ampere
80GB
$1.52/hr
A100 40GBAmpere
40GB
$1.50/hr
L40SAda Lovelace
48GB
$0.97/hr
L40Ada Lovelace
48GB
$0.97/hr
RTX 5090Blackwell
32GB
$0.72/hr
RTX 4090Ada Lovelace
24GB
$0.66/hr
RTX 6000 AdaAda Lovelace
48GB
$1.07/hr
RTX PRO 6000Blackwell
96GB
$2.15/hr
A6000Ampere
48GB
$0.55/hr
A5000Ampere
24GB
$1.56/hr
L4Ada Lovelace
24GB
$1.05/hr
A4000Ampere
16GB
$0.17/hr
A10Ampere
24GB
$1.42/hr
A16Ampere
16GB
$0.56/hr
V100 32GBVolta
32GB
$2.57/hr
V100 16GBVolta
16GB
$0.43/hr

21 GPU SKUs available. Volume + reserved discounts on Dedicated.

TALK TO AN ENGINEER

Tell us what you're shipping.

Self-serve covers most jobs. For dedicated inference, reserved compute, BYOC, or specialty SKUs — tell us your workload. We respond within 24 hours with a rate card and a pilot plan.

Response within 24 hours
7-day pilot, no commitment
Custom rate card for your workload

Prefer Slack?

Create a Slack Connect channel

FAQ

Questions worth answering.

Stop overpaying for inference.