GPU compute is billed by the hour. Every idle instance is wasted money.Documentation Index
Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Monitor your spend
Pick the right GPU
| Workload | Recommended GPU | Hourly cost |
|---|---|---|
| Inference (7B-8B models) | RTX 4090 | ~$0.35/hr |
| Inference (70B models) | A100 80 GB | ~$1.60/hr |
| Fine-tuning (7B QLoRA) | RTX 4090 | ~$0.35/hr |
| Fine-tuning (70B QLoRA) | A100 80 GB | ~$1.60/hr |
| Training (custom models) | H100 | ~$2.50/hr |
Delete instances when done
Use volumes to avoid re-setup costs
Re-downloading models wastes 10-30 minutes of GPU time per session:/workspace/ persist across deploys.
Right-size your instance
| nvidia-smi reading | Action |
|---|---|
| GPU-Util: 90%+, Memory: 80%+ | Correctly sized |
| GPU-Util: 90%+, Memory: 40% | Consider a GPU with less VRAM |
| GPU-Util: 20%, Memory: 20% | Overpaying — use a smaller GPU |
Batch your work
Deploy, process, tear down — pay only for the minutes your job runs:Use the Models API for light workloads
For inference under ~1,000 requests/day, the Models API is cheaper than a dedicated GPU:Quick checklist
- Run
runcrate psdaily — kill anything not in use. - Run
runcrate billing usageweekly — spot unexpected charges early. - Use volumes for models and data — avoid re-downloads.
- Match GPU to workload — check
nvidia-smiutilization. - Delete instances immediately after batch jobs complete.