Documentation Index
Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Deploy a fleet of instances, run the same command on all of them, and monitor everything at once. Your AI agent handles the coordination.
”Deploy 4 RTX 4090 instances for a hyperparameter sweep.”
The agent provisions the fleet:
list_gpu_types — confirms RTX 4090 availability and pricing
create_instance — creates sweep-1 with RTX 4090
create_instance — creates sweep-2 with RTX 4090
create_instance — creates sweep-3 with RTX 4090
create_instance — creates sweep-4 with RTX 4090
instance_status — polls all four until running
Deployed 4 instances:
| Instance | GPU | Status | Cost |
|---|
| sweep-1 | RTX 4090 | running | $0.35/hr |
| sweep-2 | RTX 4090 | running | $0.35/hr |
| sweep-3 | RTX 4090 | running | $0.35/hr |
| sweep-4 | RTX 4090 | running | $0.35/hr |
Fleet burn rate: $1.40/hr. All 4 are ready for commands.
”Install dependencies on all of them and start training with different learning rates.”
The agent runs commands across the fleet:
ssh_execute on each instance — pip install torch transformers datasets
file_upload on each instance — uploads train.py
ssh_execute on each instance — starts training with different args:
- sweep-1:
python train.py --lr 1e-4
- sweep-2:
python train.py --lr 5e-5
- sweep-3:
python train.py --lr 1e-5
- sweep-4:
python train.py --lr 5e-6
All 4 instances are running training with different learning rates:
| Instance | Learning Rate | Status |
|---|
| sweep-1 | 1e-4 | training started |
| sweep-2 | 5e-5 | training started |
| sweep-3 | 1e-5 | training started |
| sweep-4 | 5e-6 | training started |
Check back in ~30 minutes for results, or ask me to monitor progress.
”Check training status on all instances and tear down any that are done.”
The agent checks each instance and cleans up:
ssh_execute on each — tail -5 /root/train.log
ssh_execute on each — nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader
| Instance | GPU Util | Last Log Line | Status |
|---|
| sweep-1 | 0% | Training complete. Best loss: 0.312 | done |
| sweep-2 | 0% | Training complete. Best loss: 0.287 | done |
| sweep-3 | 94% | Epoch 4/5 Step 800/1000 Loss: 0.301 | running |
| sweep-4 | 91% | Epoch 3/5 Step 620/1000 Loss: 0.445 | running |
sweep-2 (lr=5e-5) has the best loss so far at 0.287.
Terminating finished instances to save money.
file_download on sweep-1 and sweep-2 — downloads results
delete_instance — terminates sweep-1 and sweep-2
Downloaded results from sweep-1 and sweep-2.
Terminated both. Fleet cost reduced to $0.70/hr (2 remaining).
| Tool | Purpose |
|---|
list_gpu_types | Check availability before bulk deployment |
create_instance / instance_status | Deploy fleet and wait for readiness |
file_upload / ssh_execute | Distribute code and run commands |
file_download | Retrieve results from completed runs |
delete_instance | Tear down finished instances |