Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Run a one-off experiment without any setup overhead. Tell your AI agent what you want to test, and it handles deploy, execute, collect, and cleanup in a single conversation.

”Run a quick benchmark comparing PyTorch and JAX matmul performance on an RTX 4090.”

The agent handles everything end-to-end:
  1. create_instance — deploys bench-run with RTX 4090
  2. instance_status — waits until running
  3. ssh_executepip install torch jax[cuda12]
  4. file_upload — uploads the benchmark script
  5. ssh_executepython /root/benchmark.py
  6. delete_instance — terminates immediately
Benchmark results (RTX 4090, 4096x4096 matmul, 1000 iterations):
FrameworkMean timeStd devTFLOPS
PyTorch 2.42.31 ms0.04 ms59.4
JAX 0.4.312.28 ms0.03 ms60.2
Both frameworks perform nearly identically on this workload. JAX is marginally faster (~1.3%) due to XLA compilation. **Total cost: 0.12(20minutesonRTX4090at0.12** (20 minutes on RTX 4090 at 0.35/hr). Instance terminated. Nothing left running.

”Test if my fine-tuned model loads correctly. Deploy a GPU, load it, run 5 prompts, and shut down.”

The agent runs the validation:
  1. create_instance — deploys model-test with RTX 4090
  2. instance_status — waits until running
  3. ssh_executepip install torch transformers
  4. file_upload — uploads test_model.py with 5 test prompts
  5. ssh_executepython /root/test_model.py
  6. billing_usage — checks cost
  7. delete_instance — terminates
Model validation results:
PromptOutputQuality
”Summarize this paper…“3-sentence summary, accuratepass
”Translate to French…”Correct translationpass
”Extract entities from…”Found 4/4 entitiespass
”Generate SQL for…”Valid query, correct joinspass
”Classify sentiment…”Correct label, 0.94 confidencepass
All 5 test prompts passed. Model loads in 8.2 seconds, inference at ~45 tok/s. Cost: $0.06 (10 minutes). Instance terminated.

”I want to try 3 different LoRA ranks and see which gives the best eval loss. Quick and dirty.”

The agent runs a fast comparison:
  1. create_instance — deploys lora-test with A100 80GB
  2. instance_status — waits until running
  3. ssh_execute — installs dependencies
  4. file_upload — uploads training script and small eval dataset
  5. ssh_execute — runs with --lora-rank 8 (500 steps)
  6. ssh_execute — runs with --lora-rank 16 (500 steps)
  7. ssh_execute — runs with --lora-rank 32 (500 steps)
  8. delete_instance — terminates
LoRA rank comparison (500 steps each, Llama 8B, 1K sample dataset):
RankTrainable paramsEval lossTrain time
83.4M1.8424 min
166.8M1.6875 min
3213.6M1.6917 min
Rank 16 wins. It achieves the lowest eval loss — rank 32 doesn’t improve further and trains 40% slower. The extra parameters at rank 32 are likely overfitting on this small dataset. **Total cost: 0.43(16minutesonA100at0.43** (16 minutes on A100 at 1.60/hr). Instance terminated.

Tools used in this workflow

ToolPurpose
create_instance / instance_statusProvision and wait for GPU
ssh_executeInstall packages, run experiments
file_uploadTransfer experiment scripts
delete_instanceTear down immediately after results