> ## Documentation Index
> Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# GPU Cloud Cost Optimization

> Monitor spend, choose the right GPU for your workload, auto-terminate idle instances, and reduce cloud GPU costs.

export const RuncrateStyles = () => {
  if (typeof document !== 'undefined' && !document.getElementById('runcrate-overrides')) {
    const s = document.createElement('style');
    s.id = 'runcrate-overrides';
    s.textContent = `
      /* Match Runcrate's rounding scale (--radius: 0.75rem) */
      .rounded-sm { border-radius: 0.5rem !important; }   /* 8px */
      .rounded-md { border-radius: 0.625rem !important; } /* 10px */
      .rounded-lg { border-radius: 0.75rem !important; }  /* 12px */
      .rounded-l-sm { border-top-left-radius: 0.5rem !important; border-bottom-left-radius: 0.5rem !important; }
      .rounded-r-sm { border-top-right-radius: 0.5rem !important; border-bottom-right-radius: 0.5rem !important; }
      .rounded-l-md { border-top-left-radius: 0.625rem !important; border-bottom-left-radius: 0.625rem !important; }
      .rounded-r-md { border-top-right-radius: 0.625rem !important; border-bottom-right-radius: 0.625rem !important; }
      .rounded-l-lg { border-top-left-radius: 0.75rem !important; border-bottom-left-radius: 0.75rem !important; }
      .rounded-r-lg { border-top-right-radius: 0.75rem !important; border-bottom-right-radius: 0.75rem !important; }

      /* Cards: never pure white in light mode */
      .card { background-color: #fcfcfc !important; border-radius: 0.75rem !important; }
      html.dark .card { background-color: #141414 !important; }

      /* Docs hero box */
      .rc-hero { background-color: #fcfcfc; border: 1px solid #e0e0e0; }
      html.dark .rc-hero { background-color: #141414; border-color: #242424; }
      html.dark .rc-hero h1 { color: #f5f5f5; }

      /* Runcrate scrollbar — thin, transparent track, hide-until-hover thumb */
      ::-webkit-scrollbar { width: 6px; height: 6px; background-color: transparent; }
      ::-webkit-scrollbar-track { background-color: transparent; }
      ::-webkit-scrollbar-thumb { background-color: rgba(155, 155, 155, 0.5); border-radius: 10px; transition: opacity 0.3s ease; opacity: 0; }
      ::-webkit-scrollbar-thumb:hover { background-color: rgba(155, 155, 155, 0.7); }
      *:hover::-webkit-scrollbar-thumb,
      *:focus::-webkit-scrollbar-thumb,
      *:active::-webkit-scrollbar-thumb { opacity: 1; }
      * { scrollbar-width: thin; scrollbar-color: rgba(155, 155, 155, 0.5) transparent; }
    `;
    document.head.appendChild(s);
  }
  return null;
};

<RuncrateStyles />

GPU compute is billed by the hour. Every idle instance is wasted money.

## Monitor your spend

```bash theme={"theme":"github-dark"}
runcrate billing balance     # Current credit balance
runcrate billing usage       # Per-instance spending breakdown
runcrate ps                  # List running instances (all billing right now)
```

## Pick the right GPU

| Workload                 | Recommended GPU | Hourly cost |
| ------------------------ | --------------- | ----------- |
| Inference (7B-8B models) | RTX 4090        | \~\$0.35/hr |
| Inference (70B models)   | A100 80 GB      | \~\$1.60/hr |
| Fine-tuning (7B QLoRA)   | RTX 4090        | \~\$0.35/hr |
| Fine-tuning (70B QLoRA)  | A100 80 GB      | \~\$1.60/hr |
| Training (custom models) | H100            | \~\$2.50/hr |

```bash theme={"theme":"github-dark"}
runcrate instances types     # Browse GPUs and pricing
```

## Delete instances when done

```bash theme={"theme":"github-dark"}
runcrate instances delete <name>
runcrate ps                  # Verify nothing is left running
```

## Use volumes to avoid re-setup costs

Re-downloading models wastes 10-30 minutes of GPU time per session:

```bash theme={"theme":"github-dark"}
runcrate volumes create --name workspace --size 100
runcrate instances create --name dev --gpu RTX4090 --template ubuntu-devbox --storage workspace
```

Models and packages at `/workspace/` persist across deploys.

## Right-size your instance

```bash theme={"theme":"github-dark"}
runcrate ssh <instance> -- nvidia-smi
```

| nvidia-smi reading           | Action                         |
| ---------------------------- | ------------------------------ |
| GPU-Util: 90%+, Memory: 80%+ | Correctly sized                |
| GPU-Util: 90%+, Memory: 40%  | Consider a GPU with less VRAM  |
| GPU-Util: 20%, Memory: 20%   | Overpaying — use a smaller GPU |

## Batch your work

Deploy, process, tear down — pay only for the minutes your job runs:

```bash theme={"theme":"github-dark"}
runcrate instances create --name batch --gpu A100 --template ubuntu-inference
runcrate cp ./inputs/ batch:/workspace/inputs/
runcrate ssh batch -- "cd /workspace && python process.py"
runcrate cp batch:/workspace/outputs/ ./outputs/
runcrate instances delete batch
```

## Use the Models API for light workloads

For inference under \~1,000 requests/day, the Models API is cheaper than a dedicated GPU:

```bash theme={"theme":"github-dark"}
curl https://api.runcrate.ai/v1/chat/completions \
  -H "Authorization: Bearer rc_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Hello."}],
    "max_tokens": 128
  }'
```

## Quick checklist

* [ ] Run `runcrate ps` daily — kill anything not in use.
* [ ] Run `runcrate billing usage` weekly — spot unexpected charges early.
* [ ] Use volumes for models and data — avoid re-downloads.
* [ ] Match GPU to workload — check `nvidia-smi` utilization.
* [ ] Delete instances immediately after batch jobs complete.
