Inference and compute, one platform. Pay-per-token API on every open-source model. Per-second GPU compute. Dedicated capacity when the rate card starts to hurt.
● Inference first · Compute when you need it
Trusted by teams at
Inference API
200+ models across chat, code, image, video, audio, and more. One endpoint, every provider.
80+ models


25+ models



30+ models


15+ models


20+ models



15+ models


12+ models



8+ models



Infrastructure
Raw bare metal. Full root access. Pick your hardware, deploy in 60 seconds. Per-minute billing. Scale from 1 node to 128.
Current fleet
Key specs
Full control over your environment. SSH, Docker, custom images.
Scale horizontally on demand. Add nodes in seconds, release when done.
No minimum commitments. Spin up for 5 minutes or 5 months.
Platform
01 / 05
You're billed by the minute. Stop an instance, stop paying. It's that simple.
Self-Serve
Everything you need to build, monitor, and scale your AI workloads — no DevOps expertise required.
VS Code Server, Jupyter notebooks, and terminal — all pre-configured in browser.
Real-time GPU metrics, spend tracking, and uptime dashboards for every workload.
SSH keys, encrypted connections, and role-based team permissions built in.
Pricing
vs. AWS, GCP, and Azure. No hidden fees, no egress charges.
Live GPU rates
Updated nowWhy Runcrate
AI teams shouldn't have to choose between cheap and reliable. Between managed and flexible. Between one provider and five contracts.
We built Runcrate to be the single platform for every AI compute need. Deploy a model endpoint in seconds. Spin up bare metal for training. Reserve a 128-node cluster for production.
All from one dashboard, one API, one invoice.