GPU Cloud Cost Optimization

GPU compute is billed by the hour. Every idle instance is wasted money.

Monitor your spend

runcrate billing balance     # Current credit balance
runcrate billing usage       # Per-instance spending breakdown
runcrate ps                  # List running instances (all billing right now)

Pick the right GPU

Workload	Recommended GPU	Hourly cost
Inference (7B-8B models)	RTX 4090	~$0.35/hr
Inference (70B models)	A100 80 GB	~$1.60/hr
Fine-tuning (7B QLoRA)	RTX 4090	~$0.35/hr
Fine-tuning (70B QLoRA)	A100 80 GB	~$1.60/hr
Training (custom models)	H100	~$2.50/hr

runcrate instances types     # Browse GPUs and pricing

Delete instances when done

runcrate instances delete <name>
runcrate ps                  # Verify nothing is left running

Use volumes to avoid re-setup costs

Re-downloading models wastes 10-30 minutes of GPU time per session:

runcrate volumes create --name workspace --size 100
runcrate instances create --name dev --gpu RTX4090 --template ubuntu-devbox --storage workspace

Models and packages at /workspace/ persist across deploys.

Right-size your instance

runcrate ssh <instance> -- nvidia-smi

nvidia-smi reading	Action
GPU-Util: 90%+, Memory: 80%+	Correctly sized
GPU-Util: 90%+, Memory: 40%	Consider a GPU with less VRAM
GPU-Util: 20%, Memory: 20%	Overpaying — use a smaller GPU

Batch your work

Deploy, process, tear down — pay only for the minutes your job runs:

runcrate instances create --name batch --gpu A100 --template ubuntu-inference
runcrate cp ./inputs/ batch:/workspace/inputs/
runcrate ssh batch -- "cd /workspace && python process.py"
runcrate cp batch:/workspace/outputs/ ./outputs/
runcrate instances delete batch

Use the Models API for light workloads

For inference under ~1,000 requests/day, the Models API is cheaper than a dedicated GPU:

curl https://api.runcrate.ai/v1/chat/completions \
  -H "Authorization: Bearer rc_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Hello."}],
    "max_tokens": 128
  }'

Quick checklist

Run runcrate ps daily — kill anything not in use.
Run runcrate billing usage weekly — spot unexpected charges early.
Use volumes for models and data — avoid re-downloads.
Match GPU to workload — check nvidia-smi utilization.
Delete instances immediately after batch jobs complete.

​Monitor your spend

​Pick the right GPU

​Delete instances when done

​Use volumes to avoid re-setup costs

​Right-size your instance

​Batch your work

​Use the Models API for light workloads

​Quick checklist