Solutions
·Fine-Tuning
Run LoRA, QLoRA, or full-parameter fine-tuning on bare-metal GPUs. Hugging Face Transformers, Axolotl, and LLaMA-Factory come pre-installed. Bring your dataset, pick a base model, and start adapting -- per-minute billing means you only pay for active training time.
The Workflow
Train lightweight adapters on top of frozen base models. QLoRA with 4-bit quantization lets you fine-tune 70B models on a single GPU.
Unfreeze all layers for maximum performance. Multi-GPU instances with NVLink for models that need full fine-tuning.
Hugging Face Transformers, Axolotl, LLaMA-Factory, PEFT, DeepSpeed, and bitsandbytes ready out of the box. No setup required.
Upload datasets directly to your instance via SCP, or pull from Hugging Face Hub. Persistent storage across sessions so you never re-upload.
Weights & Biases, MLflow, and TensorBoard all work out of the box. Track loss curves, hyperparameters, and checkpoints across runs.
Export your adapter or merged model directly to the Runcrate Inference API or a dedicated serving instance. No data transfer needed.
Recommended Setups
LoRA on a single L40S or full fine-tuning across H100s -- pick the setup that fits your model size and budget.
QLoRA on 7-13B modelsL40S · 48 GBFrom $0.50/hr
LoRA on 70B modelsA100 · 80 GBFrom $1.10/hr
Full fine-tune up to 70BH100 · 80 GBFrom $2.49/hr
Full fine-tune 100B+H200 · 141 GBFrom $3.49/hrHow It Works
Choose any open-source model from Hugging Face. Select LoRA, QLoRA, or full fine-tuning based on your model size and dataset.
Upload your training data via SCP or Hugging Face Hub. Use Axolotl or LLaMA-Factory config files to set hyperparameters, or write your own training script.
Launch training with real-time GPU monitoring. Evaluate against your test set, then push your model directly to a Runcrate serving instance.