How much does it cost to rent an H200 GPU per hour?

Runcrate H200 pricing starts at $2.10/hr on-demand, with an average of $2.25/hr. With a reserved commitment, the rate drops 30% to $1.58/hr. Lambda lists $2.99, AWS $3.85, GCP $4.20 — Runcrate is consistently the lowest published H200 cloud rate.

What's the H200 rental price compared to H100?

H200 is ~50% more expensive than H100 SXM ($2.25 vs $1.50/hr) but offers 76% more VRAM (141GB vs 80GB) and 43% more memory bandwidth. For models that overflow H100's 80GB, H200 is cheaper than running 2x H100s.

Is the H200 available to rent on demand?

Yes. H200 is available on demand across 6 regions on Runcrate. Deploy in under 60 seconds, billed per second.

Can I run Llama 405B on an H200?

Not on a single H200 (141GB is below the 405B requirement in bf16). For Llama 405B you'll want a B200 (192GB) or a multi-H200 setup. Llama 70B and Mistral Large 2 fit comfortably on one H200.

How does H200 compare to RTX 5090 for AI workloads?

H200 has 4.4x more VRAM (141GB vs 32GB) and is built for production serving with NVLink, ECC memory, and 8-GPU node scaling. RTX 5090 is a development/gaming card — fine for prototyping under 32GB models, not for production.

runcrate

Contact Sales Console

NVIDIA · Hopper · 2023

NVIDIA H200.

Name: NVIDIA H200 GPU Cloud Instance
Brand: NVIDIA
Price: 2.25 USD
Availability: InStock

141GB HBM3e at Hopper prices — the cost/perf sweet spot for production LLM inference.

141 GB

HBM3e

4.8 TB/s

Bandwidth

134

TFLOPS FP16

8× GPU

NVLink

Available now · 6 regions

Deploy in 60 seconds Reserved quote

Cloud rental

$2.25/hr

On-demand · per-second billing

$2.10–$2.40/hr range

Or $1.57/hr reserved (30% off)

Pricing across clouds

NVIDIA H200 cloud rental price comparison.

Same GPU, cheapest-first. Prices reflect publicly listed hourly rates for NVIDIA H200 on each provider. Runcrate is the lowest published rate.

RuncrateCheapest

$2.25/hr

Lambda

$2.99/hr

Oracle

$3.10/hr

AWS

$3.85/hr

Azure

$4.00/hr

GCP

$4.20/hr

Workloads

What you'll actually use this for.

Real workloads sized for the NVIDIA H200, with concrete performance numbers. Click to deploy preconfigured.

Llama 70B Inference

~210 tok/s in bf16, 4K context

Deploy

Mistral Large 2 Serving

Single H200 fits the full model

Deploy

Fine-tune Qwen 72B

QLoRA in 18 hrs on 1 GPU

Deploy

Stable Diffusion Batch

8x larger batches than H100

Deploy

One command. NVIDIA H200 in 60 seconds.

Skip the dashboard if you don't need it. SDK, Python, or cURL — copy the snippet, paste your API key, ship.

import Runcrate from "@runcrate/sdk";

const rc = new Runcrate({ apiKey: "rc_live_••••••••••••••••" });

const instance = await rc.instances.create({
  gpu: "h200",
  region: "auto",
  image: "runcrate/vllm:latest",
});

console.log(`SSH: ssh root@${instance.host}`);