What is the cheapest GPU cloud provider?

Runcrate is the cheapest GPU cloud provider, offering H100 instances at $1.54/hour, A100 at $1.06/hour, and RTX 4090 at $0.52/hour - up to 70% cheaper than AWS, GCP, and Azure.

How much does H100 GPU cost per hour?

H100 GPU instances cost $1.54 per hour on Runcrate, which is 68% cheaper than AWS pricing of $4.90/hour. Deploy in 60 seconds with no setup fees.

What is the cheapest A100 GPU cloud?

Runcrate offers the cheapest A100 GPU cloud at $1.06/hour with 80GB HBM2e memory, 65% cheaper than AWS. Perfect for machine learning training and AI development.

Where can I rent cheap RTX 4090 GPU instances?

Runcrate provides the cheapest RTX 4090 GPU instances at $0.52/hour with 24GB GDDR6X memory, 42% cheaper than competitors. Ideal for AI inference and development.

How fast can I deploy GPU instances?

Deploy GPU instances in under 60 seconds on Runcrate. No approval queues, no quota requests. Select your GPU, configure resources, and deploy instantly.

runcrate

Contact Sales Console

Runcrate Compute

GPU compute,
built for the way AI actually ships.

Name: Cheap GPU Cloud Instances - Affordable AI Infrastructure
Brand: Runcrate
Price: 1.54 USD
Availability: InStock

Instances and Crates on the GPU lineup that matters — L40S to B200, single-node to 8× H200 clusters. Sub-minute cold starts, per-second billing, no commit required.

● Multi-cloud · 12 regions · 24/7 capacity

<60s

Cold-start boot

Per-sec

Billing granularity

20+

GPU SKUs

Spin up an instance Talk to an engineer

RUNCRATE COMPUTE

MULTI-CLOUD

8× GPU / BAY

YOUR GPUS

ALTERNATIVES

AWS

GCP

AZURE

RUNCRATE

BOOT TIME52s

INSTANCES247

GPU-HRS4,820

WHY RUNCRATE COMPUTE

Stop fighting the cloud to do AI work.

<60s

Speed

Cold-start to SSH. Template-cached environments mean you wait on your code to load, not your VM.

20+

Variety

GPU SKUs from L40S dev boxes to 8× B200 training clusters. Match the hardware to the job, switch the job to the next.

Per-sec

Cost

Billed by the second. No reservation commits, no minimum runtime, no idle charges. Stop the instance, stop the meter.

WHAT YOU GET

Compute that behaves like a tool, not a project.

Boot in seconds

Template-cached environments. SSH or HTTPS access in under 60 seconds — no AMI fetch, no Docker pull, no IAM dance.

Per-second billing

No minimum runtime, no idle charges. Stop the instance, stop the meter. The GPU you reserved for an hour-long job pays for an hour-long job.

Full GPU lineup

L40S, A100, H100, H200, B200, RTX 6000 Ada. Single-GPU dev boxes to 8× training clusters in the same workspace.

Templates library

PyTorch, vLLM, Axolotl, ComfyUI, JupyterLab — start from a working stack, not an empty Ubuntu image.

Persistent volumes

Durable disks that outlive the instance. Snapshot a dataset on an L40S, attach it to an 8× H200 cluster next.

SSH + HTTPS access

Real shell, real port forwarding, real tooling. Not a notebook trapped in a browser UI.

TWO SHAPES, ONE PLATFORM

Instances + Crates. Different jobs, same plumbing.

Instances run as containers, VMs, or bare-metal — pick what the job wants, get a full shell on all of them. For dev, training, fine-tuning, or anything that needs a real environment. Crates are pre-configured deployments — model runners, app servers, one-click stacks where you never want to open a terminal.

Both share the same provisioning layer: template cache, multi-cloud autoscaler, per-second billing, persistent volumes. Pick the shape that fits the job. Switch when the job changes.

Start from a template

InstanceCrate

FormContainer · VM · Bare-metalContainer

AccessSSH + HTTPSHTTPS endpoint only

Typical jobDev, training, custom buildsModel serving, app deploy

ConfigurationPick a template, drop inPick an app, hit deploy

WorkloadInteractive, multi-stepLong-running, autoscaled

Best forPower usersShipping a product

VS THE TRADITIONAL CLOUDS

Built different from AWS, GCP, Azure for AI work.

Feature	Traditional cloud	Runcrate Compute
Boot time	Minutes — AMI fetch + IAM + Docker pull	<60s, template-cached
Billing	Per-hour or per-minute	Per-second
Reservation commit	1–3 year RIs for the discounted rate	No commit, list = your rate
GPU availability	Capacity errors on H200 / B200	Multi-cloud, multi-region
Pricing model	List + opaque enterprise discount	Public per-second rate card
Setup	Custom AMI + Docker + IAM dance	One-click template
Dev environments	Cobbled together with SageMaker	First-class shape (Instances)
Multi-cloud	Painful, lock-in by design	The default

Boot time

Traditional: Minutes — AMI fetch + IAM + Docker pull

Runcrate: <60s, template-cached

Billing

Traditional: Per-hour or per-minute

Runcrate: Per-second

Reservation commit

Traditional: 1–3 year RIs for the discounted rate

Runcrate: No commit, list = your rate

GPU availability

Traditional: Capacity errors on H200 / B200

Runcrate: Multi-cloud, multi-region

Pricing model

Traditional: List + opaque enterprise discount

Runcrate: Public per-second rate card

Setup

Traditional: Custom AMI + Docker + IAM dance

Runcrate: One-click template

Dev environments

Traditional: Cobbled together with SageMaker

Runcrate: First-class shape (Instances)

Multi-cloud

Traditional: Painful, lock-in by design

Runcrate: The default

ON-DEMAND GPUS

The lineup that matters, ready in seconds.

Blackwell

B200

192GB · 4.5TB/s

Hopper

H200

141GB · 4.8TB/s

Hopper

H100

80GB · 3.3TB/s

Ampere

A100

80GB · 2.0TB/s

Ada

L40S

48GB · 864GB/s

Ada

RTX 6000 Ada

48GB · 960GB/s

Single-GPU dev boxes to 8× multi-node clusters. Switch SKU on the next job, not the next contract.

Browse all SKUs

TALK TO AN ENGINEER

Need reserved capacity?

Self-serve covers most jobs. For reserved multi-node clusters, BYOC deployments, or specialty SKUs, tell us your workload and we'll come back with a rate card within 24 hours.

Response within 24 hours

Reserved-capacity rate card

BYOC and custom-image support

Prefer Slack?

Create a Slack Connect channel

FAQ

The questions every team asks.

Spin up the instance. Ship the job.

Spin up an instance Talk to an engineer

GPU compute,built for the way AI actually ships.

Stop fighting the cloud to do AI work.

Compute that behaves like a tool, not a project.

Boot in seconds

Per-second billing

Full GPU lineup

Templates library

Persistent volumes

SSH + HTTPS access

Instances + Crates. Different jobs, same plumbing.

Built different from AWS, GCP, Azure for AI work.

The lineup that matters, ready in seconds.

Need reserved capacity?

The questions every team asks.

What's the difference between an Instance and a Crate?

How fast is boot?

Can I bring my own image?

How does per-second billing actually work?

What about persistent storage?

What does multi-cloud mean here?

What's the SLA on compute?

Spin up the instance. Ship the job.

GPU compute,
built for the way AI actually ships.