> ## Documentation Index
> Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Manage GPU Infrastructure with AI Agents

> Use the Runcrate MCP server to let Claude Code, Cursor, or Windsurf deploy instances, run commands, and manage your cloud.

export const RuncrateStyles = () => {
  if (typeof document !== 'undefined' && !document.getElementById('runcrate-overrides')) {
    const s = document.createElement('style');
    s.id = 'runcrate-overrides';
    s.textContent = `
      /* Match Runcrate's rounding scale (--radius: 0.75rem) */
      .rounded-sm { border-radius: 0.5rem !important; }   /* 8px */
      .rounded-md { border-radius: 0.625rem !important; } /* 10px */
      .rounded-lg { border-radius: 0.75rem !important; }  /* 12px */
      .rounded-l-sm { border-top-left-radius: 0.5rem !important; border-bottom-left-radius: 0.5rem !important; }
      .rounded-r-sm { border-top-right-radius: 0.5rem !important; border-bottom-right-radius: 0.5rem !important; }
      .rounded-l-md { border-top-left-radius: 0.625rem !important; border-bottom-left-radius: 0.625rem !important; }
      .rounded-r-md { border-top-right-radius: 0.625rem !important; border-bottom-right-radius: 0.625rem !important; }
      .rounded-l-lg { border-top-left-radius: 0.75rem !important; border-bottom-left-radius: 0.75rem !important; }
      .rounded-r-lg { border-top-right-radius: 0.75rem !important; border-bottom-right-radius: 0.75rem !important; }

      /* Cards: never pure white in light mode */
      .card { background-color: #fcfcfc !important; border-radius: 0.75rem !important; }
      html.dark .card { background-color: #141414 !important; }

      /* Docs hero box */
      .rc-hero { background-color: #fcfcfc; border: 1px solid #e0e0e0; }
      html.dark .rc-hero { background-color: #141414; border-color: #242424; }
      html.dark .rc-hero h1 { color: #f5f5f5; }

      /* Runcrate scrollbar — thin, transparent track, hide-until-hover thumb */
      ::-webkit-scrollbar { width: 6px; height: 6px; background-color: transparent; }
      ::-webkit-scrollbar-track { background-color: transparent; }
      ::-webkit-scrollbar-thumb { background-color: rgba(155, 155, 155, 0.5); border-radius: 10px; transition: opacity 0.3s ease; opacity: 0; }
      ::-webkit-scrollbar-thumb:hover { background-color: rgba(155, 155, 155, 0.7); }
      *:hover::-webkit-scrollbar-thumb,
      *:focus::-webkit-scrollbar-thumb,
      *:active::-webkit-scrollbar-thumb { opacity: 1; }
      * { scrollbar-width: thin; scrollbar-color: rgba(155, 155, 155, 0.5) transparent; }
    `;
    document.head.appendChild(s);
  }
  return null;
};

<RuncrateStyles />

Connect the Runcrate MCP server to your AI coding assistant and manage your entire GPU cloud with natural language. Deploy instances, SSH into machines, check costs, and tear down resources — without leaving your editor.

## What you'll do

Use Claude Code, Cursor, or any MCP-compatible agent to:

* Deploy and terminate GPU instances on demand
* Run shell commands on remote machines
* Monitor GPU utilization and training progress
* Track spending and manage storage
* Automate multi-step workflows like "deploy, train, download, teardown"

***

## Setup

### Claude Code

```json theme={"theme":"github-dark"}
// .mcp.json in your project root
{
  "mcpServers": {
    "runcrate": {
      "url": "https://mcp.runcrate.ai/sse"
    }
  }
}
```

### Cursor

Add to Cursor Settings > MCP Servers:

```json theme={"theme":"github-dark"}
{
  "mcpServers": {
    "runcrate": {
      "url": "https://mcp.runcrate.ai/sse"
    }
  }
}
```

First connection opens a browser window for OAuth — authorize once, then your agent has access.

***

## Real conversations

### "What GPUs do I have running and what are they costing me?"

The agent calls `list_instances` and summarizes:

> You have 3 running instances:
>
> * **training-v2** — 1x A100, $1.20/hr, running 14 hours ($16.80 so far)
> * **dev-box** — 1x RTX 4090, $0.35/hr, running 3 hours ($1.05)
> * **inference-prod** — 1x H100, $2.50/hr, running 48 hours ($120.00)
>
> Total burn rate: $4.05/hr. Balance: $342.15 (\~84 hours at current rate).

### "My training script is throwing CUDA OOM errors. Check GPU memory on training-v2."

The agent calls `ssh_execute` with `nvidia-smi`:

> training-v2 GPU memory usage:
>
> * GPU 0: 38,912 MiB / 40,960 MiB (95% used)
>
> You're almost maxing out the A100 40GB. Options:
>
> 1. Reduce batch size
> 2. Enable gradient checkpointing
> 3. Switch to an A100 80GB instance

### "Deploy a fresh H100 for a quick benchmark, run my benchmark script, and tear it down."

The agent orchestrates the full workflow:

1. `create_instance` — deploys an H100 called "benchmark"
2. `instance_status` — polls until deployed
3. `ssh_execute` — `pip install torch && python -c "import torch; ...benchmark code..."`
4. Returns the benchmark results
5. `delete_instance` — tears down immediately

### "Kill anything that's been idle for more than 6 hours."

The agent calls `list_instances`, identifies instances by creation time, uses `ssh_execute` to check GPU utilization on each, and calls `delete_instance` on idle ones:

> Checked 3 instances:
>
> * **training-v2**: GPU at 98% utilization — keeping
> * **dev-box**: GPU at 0% for 4 hours — terminated
> * **old-experiment**: GPU at 0% for 12 hours — terminated
>
> Saved $0.35/hr + $1.20/hr = \$1.55/hr.

### "I need a persistent volume for my dataset. Create 500GB in us-east, then show me what regions are available."

The agent calls `list_storage_regions`, then `create_volume`:

> Created volume **imagenet-data** (500 GB) in us-east-1.
> Cost: \~$3.75/week ($15/month).
>
> Available regions: us-east-1, us-west-2, eu-central-1, ap-southeast-1.

### "Set up a vLLM server on an A100 with Llama 3.1 70B and give me the endpoint URL."

The agent chains:

1. `create_instance` with `gpu: "A100"`
2. `instance_status` polling
3. `ssh_execute` — `pip install vllm`
4. `ssh_execute` — starts vLLM with `--model meta-llama/Llama-3.1-70B-Instruct --port 8000 --host 0.0.0.0`
5. `get_instance` — returns the IP

> Your vLLM server is live at:
>
> ```
> http://203.0.113.42:8000/v1/chat/completions
> ```
>
> Use it as a drop-in OpenAI endpoint.

### "What templates and environments do I have?"

The agent calls `list_templates` and `list_environments`:

> **Templates:** ubuntu-devbox, ubuntu-train, ubuntu-inference
>
> **Environments:** main (default), staging, production

### "Upload my train.py script to the training-run instance at /workspace/train.py"

The agent calls `file_upload` with the file content and remote path:

> Uploaded **train.py** (4.2 KB) to **training-run** at `/workspace/train.py`.

### "Create a new environment called 'staging' in my workspace, then list all environments"

The agent calls `create_environment` with name "staging", then `list_environments`:

> Created environment **staging**.
>
> **Environments:** main (default), staging

***

## What it can't do (yet)

* Port forwarding or SSH tunnels (use native SSH)
* Modify billing settings (use the dashboard)
* Create or delete workspaces (use the dashboard)

<Note>Environment create/delete IS supported via MCP — use the `create_environment` and `delete_environment` tools.</Note>

***

## Tips

* Be specific with instance names — the agent uses them to target `ssh_execute` and `delete_instance`
* Ask the agent to check `nvidia-smi` and `df -h` before debugging — most issues are GPU OOM or disk full
* Chain requests: "deploy, install, run, download, teardown" in a single message works
* The agent remembers instance IDs within a conversation, so you can say "check the status of that instance" after deploying one