When should I use the API vs. GPU instances?

Use the API for inference with pre-hosted models. Use GPU instances for custom training, fine-tuning, self-hosting private models, or running workloads that need persistent state.

Can I fine-tune models on Runcrate?

Yes. Deploy a GPU instance with your framework of choice (PyTorch, JAX, etc.), attach storage for your dataset, and run fine-tuning jobs. Then serve the fine-tuned model via the instance or contact us about API hosting.

Is there SOC 2 compliance?

Contact our team for information about security certifications and enterprise compliance requirements.

runcrate

Contact Sales Console

ML INFRASTRUCTURE

ML infrastructure, simplified.

Everything an AI team needs in one platform. Serverless API inference for 200+ models, dedicated GPU instances for custom workloads, managed storage for datasets, and team collaboration tools. Start with the API for prototyping, scale to dedicated instances for production. One account, one billing dashboard, no vendor sprawl.

200+

API models

20+

GPUs available

60s

Deploy time

Deploy Now GPU Pricing

AVAILABLE GPUS

GPUs you can deploy today.

Model	Provider	Price	Detail
NVIDIA H200 141GB	Nvidia	From $2.41/hr	141GB HBM3e, 4.8TB/s bandwidth
NVIDIA H100 80GB	Nvidia	From $1.54/hr	80GB HBM3, 3.35TB/s, NVLink
NVIDIA B200 192GB	Nvidia	From $3.20/hr	192GB HBM3e, Blackwell arch
NVIDIA A100 80GB	Nvidia	From $1.06/hr	80GB HBM2e, 2TB/s bandwidth
NVIDIA L40S 48GB	Nvidia	From $0.80/hr	48GB GDDR6X, Ada Lovelace
NVIDIA RTX 4090 24GB	Nvidia	From $0.52/hr	24GB GDDR6X, best value

NVIDIA H200 141GB

NvidiaFrom $2.41/hr

141GB HBM3e, 4.8TB/s bandwidth

NVIDIA H100 80GB

NvidiaFrom $1.54/hr

80GB HBM3, 3.35TB/s, NVLink

NVIDIA B200 192GB

NvidiaFrom $3.20/hr

192GB HBM3e, Blackwell arch

NVIDIA A100 80GB

NvidiaFrom $1.06/hr

80GB HBM2e, 2TB/s bandwidth

NVIDIA L40S 48GB

NvidiaFrom $0.80/hr

48GB GDDR6X, Ada Lovelace

NVIDIA RTX 4090 24GB

NvidiaFrom $0.52/hr

24GB GDDR6X, best value

WHY RUNCRATE

Built for production.

Inference API + Bare-Metal GPUs

200+ models via serverless API for instant inference. Or deploy dedicated H100/H200/B200/A100 instances for training, fine-tuning, and self-hosted models. Both from one account.

Deploy in 60 Seconds

Select GPU, pick a template (PyTorch, CUDA, Jupyter, VS Code), and launch. SSH access, port forwarding, and browser IDE included. No VPC, no IAM, no cloud-architect PhD.

Per-Minute Billing, No Lock-In

Pay per minute for GPU instances, per token for API inference. No reservations, no minimum spend, no contracts. Spin up for 10 minutes or 10 months.

Multi-GPU & Distributed Training

Scale from 1 to 128 GPUs per node. NVLink and NVSwitch interconnects on H100/H200. DeepSpeed, FSDP, and Megatron-LM ready out of the box.

COMPARISON

Runcrate vs AWS SageMaker.

Feature	Runcrate	AWS SageMaker
Setup time	60 seconds	Hours to days
H100 80GB price	$1.54/hr	$4.90/hr (p5.xlarge)
A100 80GB price	$1.06/hr	$4.10/hr (p4d.24xlarge)
Pre-built inference	200+ models, one API call	JumpStart marketplace + deploy
Billing	Per-minute, prepaid credits	Per-second, invoice billing
Complexity	API key + SDK	IAM, VPC, endpoints, roles...

Setup time

Runcrate: 60 seconds

AWS SageMaker: Hours to days

H100 80GB price

Runcrate: $1.54/hr

AWS SageMaker: $4.90/hr (p5.xlarge)

A100 80GB price

Runcrate: $1.06/hr

AWS SageMaker: $4.10/hr (p4d.24xlarge)

Pre-built inference

Runcrate: 200+ models, one API call

AWS SageMaker: JumpStart marketplace + deploy

Billing

Runcrate: Per-minute, prepaid credits

AWS SageMaker: Per-second, invoice billing

Complexity

Runcrate: API key + SDK

AWS SageMaker: IAM, VPC, endpoints, roles...

GET STARTED

Try it now.

import Runcrate from "@runcrate/sdk";

const rc = new Runcrate({ apiKey: "rc_live_YOUR_API_KEY" });

// List available GPU types and pricing
const gpus = await rc.instances.listTypes({ gpuType: "H100" });
console.log(gpus); // [{ gpuType: "H100", hourlyRate: 1.54, ... }]

// Deploy an H100 instance with PyTorch
const instance = await rc.instances.create({
  name: "training-run-1",
  gpuType: "H100",
  gpuCount: 4,
  sshKeyId: "sk_...",
  template: "pytorch-cuda",
  storage: 200,
});
console.log(instance.id, instance.status);

// Check status
const status = await rc.instances.getStatus(instance.id);
console.log(status.ip); // SSH into your instance

// Also run inference via API
const response = await rc.chat.completions.create({
  model: "deepseek-ai/DeepSeek-V3",
  messages: [{ role: "user", content: "Hello!" }],
});

// Terminate when done
await rc.instances.terminate(instance.id);

FAQ

Common questions.

Build your ML infrastructure.

Get API Key View Pricing