How much does an H100 cost per hour?

Runcrate H100 SXM starts at $1.35/hr (avg $1.50/hr) and H100 PCIe starts at $1.25/hr (avg $1.40/hr). Both are per-second billed with no commitments. Reserved pricing drops the rate an additional 30%. AWS charges $4.10 for H100 SXM, GCP $3.67, Azure $3.96.

What's the difference between H100 SXM and PCIe?

H100 SXM has 67% more memory bandwidth (3,350 vs 2,000 GB/s), full NVLink support for 8-GPU clusters, and 18% higher FP16 throughput (120 vs 102 TFLOPS). PCIe is cheaper, runs in standard servers, and caps at 4 GPUs without NVLink. Pick SXM for multi-GPU training, PCIe for single-GPU inference.

Is the H100 still worth renting in 2026?

Yes. H100 remains the most-deployed datacenter GPU and the best cost/performance pick for production inference and fine-tuning of models up to 70B. Newer GPUs like H200 and B200 offer more VRAM but at a higher hourly rate. For workloads that fit in 80GB, H100 is the value sweet spot.

Can I use NVLink with H100 PCIe?

No. NVLink is exclusive to the H100 SXM form factor. PCIe cards communicate via standard PCIe lanes, limiting multi-GPU bandwidth. If you need fast GPU-to-GPU communication for distributed training, use H100 SXM.

How does H100 compare to H200 and B200?

H200 has 76% more VRAM (141GB vs 80GB) and 43% more bandwidth than H100 SXM, at roughly 50% higher cost ($2.25/hr). B200 doubles the FP16 throughput to 180 TFLOPS with 192GB HBM3e at $3.40/hr. For production inference under 80GB, H100 is the cheapest option. For 70B+ models or frontier training, H200/B200 justify the premium.

How much does an H100 cost per hour?

Runcrate H100 SXM starts at $1.35/hr (avg $1.50/hr) and H100 PCIe starts at $1.25/hr (avg $1.40/hr). Both are per-second billed with no commitments. Reserved pricing drops the rate an additional 30%. AWS charges $4.10 for H100 SXM, GCP $3.67, Azure $3.96.

What's the difference between H100 SXM and PCIe?

H100 SXM has 67% more memory bandwidth (3,350 vs 2,000 GB/s), full NVLink support for 8-GPU clusters, and 18% higher FP16 throughput (120 vs 102 TFLOPS). PCIe is cheaper, runs in standard servers, and caps at 4 GPUs without NVLink. Pick SXM for multi-GPU training, PCIe for single-GPU inference.

Is the H100 still worth renting in 2026?

Yes. H100 remains the most-deployed datacenter GPU and the best cost/performance pick for production inference and fine-tuning of models up to 70B. Newer GPUs like H200 and B200 offer more VRAM but at a higher hourly rate. For workloads that fit in 80GB, H100 is the value sweet spot.

Can I use NVLink with H100 PCIe?

No. NVLink is exclusive to the H100 SXM form factor. PCIe cards communicate via standard PCIe lanes, limiting multi-GPU bandwidth. If you need fast GPU-to-GPU communication for distributed training, use H100 SXM.

How does H100 compare to H200 and B200?

H200 has 76% more VRAM (141GB vs 80GB) and 43% more bandwidth than H100 SXM, at roughly 50% higher cost ($2.25/hr). B200 doubles the FP16 throughput to 180 TFLOPS with 192GB HBM3e at $3.40/hr. For production inference under 80GB, H100 is the cheapest option. For 70B+ models or frontier training, H200/B200 justify the premium.

runcrate

Contact Sales Console

NVIDIA · Hopper · 2022

NVIDIA H100.

Name: NVIDIA H100 GPU Cloud Instance
Brand: NVIDIA
Price: 1.50 USD
Availability: InStock

The most-deployed datacenter GPU for AI training and inference. Available in SXM (NVLink, 3.35 TB/s) and PCIe (standard servers, lower cost) form factors — both with 80GB HBM3. From $1.25/hr.

80 GB

HBM3

3.35 TB/s

SXM bandwidth

120

TFLOPS FP16 (SXM)

8× GPU

NVLink (SXM)

Available now · 8 regions

Deploy in 60 seconds Reserved quote

Cloud rental

$1.25/hr

PCIe from $1.25/hr · avg $1.40/hr

SXM from $1.35/hr · avg $1.50/hr

Per-second billing · no commitments

Choose your variant

H100 SXM vs PCIe.

Same Hopper architecture, same 80GB HBM3. SXM trades a higher price for NVLink and 67% more memory bandwidth. PCIe fits standard servers at a lower hourly rate.

NVIDIA H100 SXM

NVIDIA H100 PCIe

Best value

$1.40/hr avg($1.25–$1.55)

80GB HBM3
2,000 GB/s bandwidth
102 TFLOPS FP16
PCIe · 4 GPUs/node

Deploy PCIe Compare

Pricing across clouds

NVIDIA H100 SXM cloud rental price comparison.

Same GPU, cheapest-first. Prices reflect publicly listed hourly rates for NVIDIA H100 SXM on each provider. Runcrate is the lowest published rate.

RuncrateCheapest

$1.50/hr

RunPod

$1.99/hr

Oracle

$2.40/hr

Lambda

$2.49/hr

GCP

$3.67/hr

Azure

$3.96/hr

AWS

$4.10/hr

Workloads

What you'll actually use this for.

Real workloads sized for the NVIDIA H100 SXM, with concrete performance numbers. Click to deploy preconfigured.

Llama 70B Fine-tune

QLoRA in 12 hrs on 1 GPU

Deploy

Mistral 7B Inference

~480 tok/s in bf16

Deploy

Stable Diffusion XL

1024x1024 in 1.2s

Deploy

Multi-GPU Training

8x H100 SXM with NVLink

Deploy

One command. NVIDIA H100 SXM in 60 seconds.

Skip the dashboard if you don't need it. SDK, Python, or cURL — copy the snippet, paste your API key, ship.

import Runcrate from "@runcrate/sdk";

const rc = new Runcrate({ apiKey: "rc_live_••••••••••••••••" });

const instance = await rc.instances.create({
  gpu: "h100-sxm",
  region: "auto",
  image: "runcrate/vllm:latest",
});

console.log(`SSH: ssh root@${instance.host}`);

Decision guide

Which H100 should I choose?

Both variants share the same Hopper architecture and 80GB HBM3. The difference is interconnect, bandwidth, and price.

Choose SXM if

Multi-GPU distributed training (2-8 GPUs with NVLink)
Workloads that saturate memory bandwidth (large-batch inference, long-context)
Production serving that needs maximum throughput per GPU
Fine-tuning 70B+ models across multiple GPUs

Choose PCIe if

Single-GPU inference for models under 70B
Budget-conscious teams that need Hopper at the lowest hourly rate
Standard server deployments without SXM baseboard requirements
QLoRA fine-tuning on a single GPU

Full specs

H100 SXM vs PCIe — side by side.

Spec	H100 SXM	H100 PCIe
Price (avg)	$1.50/hr	$1.40/hr
Price (range)	$1.35–$1.65/hr	$1.25–$1.55/hr
VRAM	80 GB	80 GB
Memory type	HBM3	HBM3
Memory bandwidth	3.4 TB/s	2.0 TB/s
FP32	60 TFLOPS	51 TFLOPS
FP16	120 TFLOPS	102 TFLOPS
INT8	2400 TOPS	2040 TOPS
Tensor cores	16,896	14,592
CUDA cores	16,896	14,592
TDP	700W	350W
Form factor	SXM	PCIe
NVLink	Yes	No
Max GPUs/node	8	4
Architecture	Hopper	Hopper
Release year	2022	2022