What is the cheapest GPU cloud provider?

Runcrate is the cheapest GPU cloud provider, offering H100 instances at $1.54/hour, A100 at $1.06/hour, and RTX 4090 at $0.52/hour - up to 70% cheaper than AWS, GCP, and Azure.

How much does H100 GPU cost per hour?

H100 GPU instances cost $1.54 per hour on Runcrate, which is 68% cheaper than AWS pricing of $4.90/hour. Deploy in 60 seconds with no setup fees.

What is the cheapest A100 GPU cloud?

Runcrate offers the cheapest A100 GPU cloud at $1.06/hour with 80GB HBM2e memory, 65% cheaper than AWS. Perfect for machine learning training and AI development.

Where can I rent cheap RTX 4090 GPU instances?

Runcrate provides the cheapest RTX 4090 GPU instances at $0.52/hour with 24GB GDDR6X memory, 42% cheaper than competitors. Ideal for AI inference and development.

How fast can I deploy GPU instances?

Deploy GPU instances in under 60 seconds on Runcrate. No approval queues, no quota requests. Select your GPU, configure resources, and deploy instantly.

runcrate

Contact Sales Console

Runcrate Pricing

Pricing built for the workload.
Not the quota.

Name: Cheap GPU Cloud Instances - Affordable AI Infrastructure
Brand: Runcrate
Price: 1.54 USD
Availability: InStock

Public rate card on every open-source model. Per-second GPU compute, no commit. Dedicated capacity when the rate card starts to hurt. BYOC when compliance demands it.

Get an API key Talk to an engineer

170+

Models

40–60%

Off aggregators

Per-sec

GPU billing

To start

PLANS

Three plans. No surprises.

Self-Serve

Get an API key. Ship today. Talk to nobody.

per month · pay as you go

What's included

Pay-per-token inference on every model
Per-second GPU compute
OpenAI-compatible endpoint
Public rate card, no negotiations
Multi-cloud autoscaling
Email + community Discord

Deployment

Runcrate Cloud

Get an API key

Dedicated

Enterprise

Run it our way. Or run it in your VPC.

Custom

tailored contract

What's included

Everything in Dedicated
Custom contract pricing on inference
BYOC + self-hosted deployments
Region pinning — US / EU / APAC
HIPAA-eligible
SOC 2 Type II datacenter partners
99.95% / 99.99% SLA with credits
Named CSM + on-call engineering

Deployment

Runcrate CloudBYOCHybrid

Talk to sales

INFERENCE · PAY PER TOKEN

The open-source frontier. One API. Public rate card.

ModelTypePriceContext

Kimi K2.5

Moonshot AI

TEXT

$0.45 / $2.25/ 1M

262K

Try

DeepSeek V4 Flash

DeepSeek AI

TEXT

$0.14 / $0.28/ 1M

Try

Llama 4 Maverick 17B 128E Instruct FP8

When the rate card starts to hurt, reserve capacity.

Reserve GPU capacity sized to your traffic, served on Arc — our serving stack that fits 2–3× more tokens per GPU than vanilla vLLM. Commit to a monthly minimum, get a discounted rate on every token. No more 429s on burst, no more TPM ceilings.

Past ~$10k/month, dedicated is cheaper and the bill stops surprising you. We run a 7-day pilot parallel to your current provider — verify cost and latency before flipping a switch.

See the Inference Engine

What you get

Dedicated capacity sized to your traffic
40–60% off the public rate card
p99 latency floor written into the contract
Burst-tolerant — no 429s, no TPM ceilings
99.9% uptime SLA
OpenAI-compatible — swap one base URL

COMPUTE · PER SECOND

GPUs by the second.
Stop the instance, stop the meter.

Live API · no key required

# real-time GPU pricing

GET /api/gpu-pricing

Refreshed every 5 minutes. Hit it from anywhere.

GPU InstanceVRAMPrice / hour

B300Blackwell

288GB

$8.48/hr

Pricing built for the workload.Not the quota.

Three plans. No surprises.

Self-Serve

Dedicated

Enterprise

The open-source frontier. One API. Public rate card.

When the rate card starts to hurt, reserve capacity.

GPUs by the second.Stop the instance, stop the meter.

Tell us what you're shipping.

Questions worth answering.

How does pay-per-token billing work?

When should I move from Self-Serve to Dedicated?

What does Enterprise add over Dedicated?

How is GPU compute priced?

Are there hidden fees?

Can I bring my own fine-tuned model?

Do credits expire?

Stop overpaying for inference.

Pricing built for the workload.
Not the quota.

GPUs by the second.
Stop the instance, stop the meter.