> ## Documentation Index
> Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Fine-Tune an LLM with LoRA

> Fine-tune Llama, Mistral, or Qwen on your own data using LoRA/QLoRA on a cloud GPU.

export const RuncrateStyles = () => {
  if (typeof document !== 'undefined' && !document.getElementById('runcrate-overrides')) {
    const s = document.createElement('style');
    s.id = 'runcrate-overrides';
    s.textContent = `
      /* Match Runcrate's rounding scale (--radius: 0.75rem) */
      .rounded-sm { border-radius: 0.5rem !important; }   /* 8px */
      .rounded-md { border-radius: 0.625rem !important; } /* 10px */
      .rounded-lg { border-radius: 0.75rem !important; }  /* 12px */
      .rounded-l-sm { border-top-left-radius: 0.5rem !important; border-bottom-left-radius: 0.5rem !important; }
      .rounded-r-sm { border-top-right-radius: 0.5rem !important; border-bottom-right-radius: 0.5rem !important; }
      .rounded-l-md { border-top-left-radius: 0.625rem !important; border-bottom-left-radius: 0.625rem !important; }
      .rounded-r-md { border-top-right-radius: 0.625rem !important; border-bottom-right-radius: 0.625rem !important; }
      .rounded-l-lg { border-top-left-radius: 0.75rem !important; border-bottom-left-radius: 0.75rem !important; }
      .rounded-r-lg { border-top-right-radius: 0.75rem !important; border-bottom-right-radius: 0.75rem !important; }

      /* Cards: never pure white in light mode */
      .card { background-color: #fcfcfc !important; border-radius: 0.75rem !important; }
      html.dark .card { background-color: #141414 !important; }

      /* Docs hero box */
      .rc-hero { background-color: #fcfcfc; border: 1px solid #e0e0e0; }
      html.dark .rc-hero { background-color: #141414; border-color: #242424; }
      html.dark .rc-hero h1 { color: #f5f5f5; }

      /* Runcrate scrollbar — thin, transparent track, hide-until-hover thumb */
      ::-webkit-scrollbar { width: 6px; height: 6px; background-color: transparent; }
      ::-webkit-scrollbar-track { background-color: transparent; }
      ::-webkit-scrollbar-thumb { background-color: rgba(155, 155, 155, 0.5); border-radius: 10px; transition: opacity 0.3s ease; opacity: 0; }
      ::-webkit-scrollbar-thumb:hover { background-color: rgba(155, 155, 155, 0.7); }
      *:hover::-webkit-scrollbar-thumb,
      *:focus::-webkit-scrollbar-thumb,
      *:active::-webkit-scrollbar-thumb { opacity: 1; }
      * { scrollbar-width: thin; scrollbar-color: rgba(155, 155, 155, 0.5) transparent; }
    `;
    document.head.appendChild(s);
  }
  return null;
};

<RuncrateStyles />

Train a domain-specific LLM on your own dataset using LoRA (Low-Rank Adaptation). A single A100 can fine-tune a 7B–70B model in hours. QLoRA pushes 7B fine-tuning down to an RTX 4090 (24GB VRAM).

## What you'll build

A fine-tuned model adapter that specializes an open-source LLM for your use case — customer support, medical QA, code generation, legal analysis, or anything else. The adapter merges back into the base model and can be served with vLLM.

## GPU sizing

| Model Size | Method        | GPU       | VRAM Needed | Time (1K samples) |
| ---------- | ------------- | --------- | ----------- | ----------------- |
| 7B–8B      | QLoRA (4-bit) | RTX 4090  | \~12 GB     | \~30 min          |
| 7B–8B      | LoRA (FP16)   | A100 40GB | \~30 GB     | \~20 min          |
| 13B        | QLoRA (4-bit) | RTX 4090  | \~18 GB     | \~45 min          |
| 70B        | QLoRA (4-bit) | A100 80GB | \~48 GB     | \~3 hrs           |
| 70B        | LoRA (FP16)   | 2x H100   | \~140 GB    | \~2 hrs           |

## LoRA rank selection

| Rank  | Use Case                                          |
| ----- | ------------------------------------------------- |
| 8     | Formatting, tone, and style changes               |
| 16–32 | Moderate domain shift (e.g., medical terminology) |
| 64    | Substantial knowledge injection                   |

***

## Step-by-step (CLI)

### 1. Prepare your dataset

Create a JSONL file with your training data:

```json theme={"theme":"github-dark"}
{"messages": [{"role": "user", "content": "What's your refund policy?"}, {"role": "assistant", "content": "We offer full refunds within 30 days of purchase. After 30 days, we provide store credit."}]}
{"messages": [{"role": "user", "content": "How do I track my order?"}, {"role": "assistant", "content": "Go to Orders in your account dashboard and click on the order number. You'll see real-time tracking."}]}
```

### 2. Deploy a GPU and upload your data

```bash theme={"theme":"github-dark"}
runcrate instances create --name finetune --gpu A100

# Upload dataset and training script
runcrate cp ./train_data.jsonl finetune:/root/
runcrate cp ./finetune.py finetune:/root/
```

### 3. Install dependencies

```bash theme={"theme":"github-dark"}
runcrate ssh finetune -- "pip install torch transformers datasets peft trl accelerate bitsandbytes"
```

### 4. Training script (`finetune.py`)

```python theme={"theme":"github-dark"}
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer, SFTConfig

model_id = "meta-llama/Llama-3.1-8B-Instruct"

# QLoRA: load model in 4-bit
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="bfloat16",
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
)

# LoRA config — rank 16 for domain adaptation
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

# Load dataset
dataset = load_dataset("json", data_files="/root/train_data.jsonl", split="train")

# Train
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    args=SFTConfig(
        output_dir="/root/output",
        num_train_epochs=3,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        learning_rate=2e-4,
        logging_steps=10,
        save_strategy="epoch",
        bf16=True,
    ),
    processing_class=tokenizer,
)

trainer.train()
trainer.save_model("/root/output/final")
tokenizer.save_pretrained("/root/output/final")
print("Training complete.")
```

### 5. Run training

```bash theme={"theme":"github-dark"}
runcrate ssh finetune -- "cd /root && python finetune.py"
```

### 6. Monitor training

```bash theme={"theme":"github-dark"}
# Check progress
runcrate ssh finetune -- "tail -20 /root/output/training.log"

# Watch GPU utilization
runcrate ssh finetune -- "nvidia-smi"
```

### 7. Download the adapter and clean up

```bash theme={"theme":"github-dark"}
# Download the LoRA adapter
runcrate cp -r finetune:/root/output/final/ ./my-adapter/

# Tear down
runcrate instances delete finetune
```

### 8. Merge and serve

After downloading, merge the adapter into the base model locally or on another instance:

```python theme={"theme":"github-dark"}
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
model = PeftModel.from_pretrained(base_model, "./my-adapter")
merged = model.merge_and_unload()

merged.save_pretrained("./merged-model")
AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct").save_pretrained("./merged-model")
```

Then serve with vLLM (see [Deploy a vLLM Inference Server](/examples/vllm-inference-server)):

```bash theme={"theme":"github-dark"}
runcrate ssh server -- "python -m vllm.entrypoints.openai.api_server --model /root/merged-model --port 8000 --host 0.0.0.0"
```

***

## Using the Python SDK

```python theme={"theme":"github-dark"}
from runcrate import Runcrate
import time

client = Runcrate(api_key="rc_live_...")

# Deploy with all dependencies pre-installed
instance = client.instances.create(
    name="finetune",
    gpu_type="A100",
    gpu_count=1,
    startup_commands=[
        "pip install torch transformers datasets peft trl accelerate bitsandbytes",
    ],
)

# Wait for deployment
while True:
    status = client.instances.get_status(instance.id)
    if status.status == "deployed":
        break
    time.sleep(10)

print(f"Ready — SSH: root@{status.ip}")
```

***

## Using MCP (via Claude Code / Cursor)

> "Spin up an A100 called 'finetune'. Install torch, transformers, peft, trl, accelerate, and bitsandbytes. Then show me a training script for QLoRA fine-tuning Llama 3.1 8B."

The agent deploys the instance, installs packages via `ssh_execute`, and generates the training script for you.
