> ## Documentation Index
> Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Run Quick ML Experiments with AI Agents

> Use MCP tools to deploy a GPU, run an experiment, get results, and tear down — all in one conversation. Zero infrastructure left behind.

export const RuncrateStyles = () => {
  if (typeof document !== 'undefined' && !document.getElementById('runcrate-overrides')) {
    const s = document.createElement('style');
    s.id = 'runcrate-overrides';
    s.textContent = `
      /* Match Runcrate's rounding scale (--radius: 0.75rem) */
      .rounded-sm { border-radius: 0.5rem !important; }   /* 8px */
      .rounded-md { border-radius: 0.625rem !important; } /* 10px */
      .rounded-lg { border-radius: 0.75rem !important; }  /* 12px */
      .rounded-l-sm { border-top-left-radius: 0.5rem !important; border-bottom-left-radius: 0.5rem !important; }
      .rounded-r-sm { border-top-right-radius: 0.5rem !important; border-bottom-right-radius: 0.5rem !important; }
      .rounded-l-md { border-top-left-radius: 0.625rem !important; border-bottom-left-radius: 0.625rem !important; }
      .rounded-r-md { border-top-right-radius: 0.625rem !important; border-bottom-right-radius: 0.625rem !important; }
      .rounded-l-lg { border-top-left-radius: 0.75rem !important; border-bottom-left-radius: 0.75rem !important; }
      .rounded-r-lg { border-top-right-radius: 0.75rem !important; border-bottom-right-radius: 0.75rem !important; }

      /* Cards: never pure white in light mode */
      .card { background-color: #fcfcfc !important; border-radius: 0.75rem !important; }
      html.dark .card { background-color: #141414 !important; }

      /* Docs hero box */
      .rc-hero { background-color: #fcfcfc; border: 1px solid #e0e0e0; }
      html.dark .rc-hero { background-color: #141414; border-color: #242424; }
      html.dark .rc-hero h1 { color: #f5f5f5; }

      /* Runcrate scrollbar — thin, transparent track, hide-until-hover thumb */
      ::-webkit-scrollbar { width: 6px; height: 6px; background-color: transparent; }
      ::-webkit-scrollbar-track { background-color: transparent; }
      ::-webkit-scrollbar-thumb { background-color: rgba(155, 155, 155, 0.5); border-radius: 10px; transition: opacity 0.3s ease; opacity: 0; }
      ::-webkit-scrollbar-thumb:hover { background-color: rgba(155, 155, 155, 0.7); }
      *:hover::-webkit-scrollbar-thumb,
      *:focus::-webkit-scrollbar-thumb,
      *:active::-webkit-scrollbar-thumb { opacity: 1; }
      * { scrollbar-width: thin; scrollbar-color: rgba(155, 155, 155, 0.5) transparent; }
    `;
    document.head.appendChild(s);
  }
  return null;
};

<RuncrateStyles />

Run a one-off experiment without any setup overhead. Tell your AI agent what you want to test, and it handles deploy, execute, collect, and cleanup in a single conversation.

***

## "Run a quick benchmark comparing PyTorch and JAX matmul performance on an RTX 4090."

The agent handles everything end-to-end:

1. **`create_instance`** — deploys `bench-run` with RTX 4090
2. **`instance_status`** — waits until running
3. **`ssh_execute`** — `pip install torch jax[cuda12]`
4. **`file_upload`** — uploads the benchmark script
5. **`ssh_execute`** — `python /root/benchmark.py`
6. **`delete_instance`** — terminates immediately

> **Benchmark results (RTX 4090, 4096x4096 matmul, 1000 iterations):**
>
> | Framework   | Mean time | Std dev | TFLOPS |
> | ----------- | --------- | ------- | ------ |
> | PyTorch 2.4 | 2.31 ms   | 0.04 ms | 59.4   |
> | JAX 0.4.31  | 2.28 ms   | 0.03 ms | 60.2   |
>
> Both frameworks perform nearly identically on this workload. JAX is marginally faster (\~1.3%) due to XLA compilation.
>
> \*\*Total cost: $0.12** (20 minutes on RTX 4090 at $0.35/hr).
> Instance terminated. Nothing left running.

***

## "Test if my fine-tuned model loads correctly. Deploy a GPU, load it, run 5 prompts, and shut down."

The agent runs the validation:

1. **`create_instance`** — deploys `model-test` with RTX 4090
2. **`instance_status`** — waits until running
3. **`ssh_execute`** — `pip install torch transformers`
4. **`file_upload`** — uploads `test_model.py` with 5 test prompts
5. **`ssh_execute`** — `python /root/test_model.py`
6. **`billing_usage`** — checks cost
7. **`delete_instance`** — terminates

> **Model validation results:**
>
> | Prompt                     | Output                         | Quality |
> | -------------------------- | ------------------------------ | ------- |
> | "Summarize this paper..."  | 3-sentence summary, accurate   | pass    |
> | "Translate to French..."   | Correct translation            | pass    |
> | "Extract entities from..." | Found 4/4 entities             | pass    |
> | "Generate SQL for..."      | Valid query, correct joins     | pass    |
> | "Classify sentiment..."    | Correct label, 0.94 confidence | pass    |
>
> All 5 test prompts passed. Model loads in 8.2 seconds, inference at \~45 tok/s.
> **Cost: \$0.06** (10 minutes). Instance terminated.

***

## "I want to try 3 different LoRA ranks and see which gives the best eval loss. Quick and dirty."

The agent runs a fast comparison:

1. **`create_instance`** — deploys `lora-test` with A100 80GB
2. **`instance_status`** — waits until running
3. **`ssh_execute`** — installs dependencies
4. **`file_upload`** — uploads training script and small eval dataset
5. **`ssh_execute`** — runs with `--lora-rank 8` (500 steps)
6. **`ssh_execute`** — runs with `--lora-rank 16` (500 steps)
7. **`ssh_execute`** — runs with `--lora-rank 32` (500 steps)
8. **`delete_instance`** — terminates

> **LoRA rank comparison (500 steps each, Llama 8B, 1K sample dataset):**
>
> | Rank | Trainable params | Eval loss | Train time |
> | ---- | ---------------- | --------- | ---------- |
> | 8    | 3.4M             | 1.842     | 4 min      |
> | 16   | 6.8M             | 1.687     | 5 min      |
> | 32   | 13.6M            | 1.691     | 7 min      |
>
> **Rank 16 wins.** It achieves the lowest eval loss — rank 32 doesn't improve further and trains 40% slower. The extra parameters at rank 32 are likely overfitting on this small dataset.
>
> \*\*Total cost: $0.43** (16 minutes on A100 at $1.60/hr). Instance terminated.

***

## Tools used in this workflow

| Tool                                  | Purpose                             |
| ------------------------------------- | ----------------------------------- |
| `create_instance` / `instance_status` | Provision and wait for GPU          |
| `ssh_execute`                         | Install packages, run experiments   |
| `file_upload`                         | Transfer experiment scripts         |
| `delete_instance`                     | Tear down immediately after results |
