How much does the NVIDIA B200 cost per hour to rent?

On Runcrate, the B200 starts at $3.20/hr on-demand (avg $3.40/hr) and $2.38/hr with a reserved commitment. Compared to AWS ($5.20), GCP ($6.10), and Azure ($5.85), Runcrate is the cheapest published rate for B200 cloud rental.

Is the B200 actually available right now?

Yes. We have B200 capacity across 4 regions for both reserved and on-demand workloads. Most users deploy in under 60 seconds from signup.

B200 vs H200 — which should I rent?

B200 has ~2x the FP16 throughput of H200 and the same 192GB HBM3e memory tier. For frontier-model training and large-batch inference, B200 wins. For everyday LLM inference under 70B parameters, H200 is the better cost/perf pick.

Can I rent multiple B200 GPUs together for distributed training?

Yes, up to 8 B200 GPUs in a single SXM node with NVLink. For multi-node clusters, contact us for a reserved quote with InfiniBand interconnect.

What can I run on a single B200?

Llama 405B, DeepSeek-V3, Mistral Large 2, Qwen3-235B inference all fit on one B200 in bf16. Training-wise, you can fine-tune up to 70B with full FT or 405B with LoRA.

runcrate

Contact Sales Console

NVIDIA · Blackwell · 2024

NVIDIA B200.

Name: NVIDIA B200 GPU Cloud Instance
Brand: NVIDIA
Price: 3.40 USD
Availability: InStock

Frontier-model training and inference at the highest memory bandwidth shipping today.

192 GB

HBM3e

8.0 TB/s

Bandwidth

180

TFLOPS FP16

8× GPU

NVLink

Available now · 4 regions

Deploy in 60 seconds Reserved quote

Cloud rental

$3.40/hr

On-demand · per-second billing

$3.20–$3.60/hr range

Or $2.38/hr reserved (30% off)

Pricing across clouds

NVIDIA B200 cloud rental price comparison.

Same GPU, cheapest-first. Prices reflect publicly listed hourly rates for NVIDIA B200 on each provider. Runcrate is the lowest published rate.

RuncrateCheapest

$3.40/hr

Oracle

$4.95/hr

AWS

$5.20/hr

Azure

$5.85/hr

GCP

$6.10/hr

Workloads

What you'll actually use this for.

Real workloads sized for the NVIDIA B200, with concrete performance numbers. Click to deploy preconfigured.

Llama 405B Inference

~145 tok/s on a single B200

Deploy

Train DeepSeek-V3

Full bf16 fits in 8x B200 node

Deploy

Fine-tune Llama 70B

QLoRA on 1 GPU, full FT on 4

Deploy

Multi-modal serving

Vision + LLM in one process

Deploy

One command. NVIDIA B200 in 60 seconds.

Skip the dashboard if you don't need it. SDK, Python, or cURL — copy the snippet, paste your API key, ship.

import Runcrate from "@runcrate/sdk";

const rc = new Runcrate({ apiKey: "rc_live_••••••••••••••••" });

const instance = await rc.instances.create({
  gpu: "b200",
  region: "auto",
  image: "runcrate/vllm:latest",
});

console.log(`SSH: ssh root@${instance.host}`);