> ## Documentation Index
> Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# GPU Clusters

> Bare-metal GPU clusters sized from 16 to 128+ nodes.

export const RuncrateStyles = () => {
  if (typeof document !== 'undefined' && !document.getElementById('runcrate-overrides')) {
    const s = document.createElement('style');
    s.id = 'runcrate-overrides';
    s.textContent = `
      /* Match Runcrate's rounding scale (--radius: 0.75rem) */
      .rounded-sm { border-radius: 0.5rem !important; }   /* 8px */
      .rounded-md { border-radius: 0.625rem !important; } /* 10px */
      .rounded-lg { border-radius: 0.75rem !important; }  /* 12px */
      .rounded-l-sm { border-top-left-radius: 0.5rem !important; border-bottom-left-radius: 0.5rem !important; }
      .rounded-r-sm { border-top-right-radius: 0.5rem !important; border-bottom-right-radius: 0.5rem !important; }
      .rounded-l-md { border-top-left-radius: 0.625rem !important; border-bottom-left-radius: 0.625rem !important; }
      .rounded-r-md { border-top-right-radius: 0.625rem !important; border-bottom-right-radius: 0.625rem !important; }
      .rounded-l-lg { border-top-left-radius: 0.75rem !important; border-bottom-left-radius: 0.75rem !important; }
      .rounded-r-lg { border-top-right-radius: 0.75rem !important; border-bottom-right-radius: 0.75rem !important; }

      /* Cards: never pure white in light mode */
      .card { background-color: #fcfcfc !important; border-radius: 0.75rem !important; }
      html.dark .card { background-color: #141414 !important; }

      /* Docs hero box */
      .rc-hero { background-color: #fcfcfc; border: 1px solid #e0e0e0; }
      html.dark .rc-hero { background-color: #141414; border-color: #242424; }
      html.dark .rc-hero h1 { color: #f5f5f5; }

      /* Runcrate scrollbar — thin, transparent track, hide-until-hover thumb */
      ::-webkit-scrollbar { width: 6px; height: 6px; background-color: transparent; }
      ::-webkit-scrollbar-track { background-color: transparent; }
      ::-webkit-scrollbar-thumb { background-color: rgba(155, 155, 155, 0.5); border-radius: 10px; transition: opacity 0.3s ease; opacity: 0; }
      ::-webkit-scrollbar-thumb:hover { background-color: rgba(155, 155, 155, 0.7); }
      *:hover::-webkit-scrollbar-thumb,
      *:focus::-webkit-scrollbar-thumb,
      *:active::-webkit-scrollbar-thumb { opacity: 1; }
      * { scrollbar-width: thin; scrollbar-color: rgba(155, 155, 155, 0.5) transparent; }
    `;
    document.head.appendChild(s);
  }
  return null;
};

<RuncrateStyles />

Runcrate dedicated clusters are bare-metal GPU deployments tailored to your workload. Each cluster is single-tenant — dedicated hardware that only your team can access.

## Cluster Configurations

| Size        | Nodes  | GPUs (8 per node) | Best For                                            |
| ----------- | ------ | ----------------- | --------------------------------------------------- |
| **Starter** | 16     | 128               | Fine-tuning large models, multi-node training       |
| **Growth**  | 32–64  | 256–512           | Pre-training mid-size models, large-scale inference |
| **Scale**   | 64–128 | 512–1,024         | Frontier model training, massive parallel workloads |
| **Custom**  | 128+   | 1,024+            | Custom configurations for unique requirements       |

## What's Included

Every dedicated cluster includes:

* **Bare-metal servers** — No virtualization overhead. Full hardware access with root.
* **NVIDIA GPUs** — 8 GPUs per node with NVLink for intra-node communication.
* **InfiniBand networking** — High-bandwidth, low-latency interconnect between nodes for distributed training.
* **High-performance storage** — NVMe SSDs for fast data access during training.
* **Dedicated networking** — Private network with no shared bandwidth.
* **24/7 monitoring** — Infrastructure health monitoring and alerting.

## Cluster Architecture

```mermaid theme={"theme":"github-dark"}
graph TB
    subgraph "Your Dedicated Cluster"
        subgraph "Node 1"
            G1[8x GPUs]
            N1[NVMe Storage]
        end
        subgraph "Node 2"
            G2[8x GPUs]
            N2[NVMe Storage]
        end
        subgraph "Node N"
            G3[8x GPUs]
            N3[NVMe Storage]
        end
    end
    IB[InfiniBand Fabric]
    G1 --- IB
    G2 --- IB
    G3 --- IB
```

## Use Cases

### Distributed Training

Train large language models, vision models, or multimodal models across hundreds of GPUs. InfiniBand ensures efficient gradient synchronization with minimal communication overhead.

### Production Inference

Serve models at scale with predictable latency. Dedicated hardware means no cold starts and no resource contention.

### Fine-Tuning at Scale

Run multiple fine-tuning jobs in parallel across your cluster. Full control over scheduling and resource allocation.

### Research

Experiment with new architectures, training techniques, and scaling laws on dedicated infrastructure without worrying about availability or spot interruptions.

## Software Stack

You have full control over the software stack. Common setups include:

* **Kubernetes** (managed or self-managed)
* **Slurm** for HPC-style job scheduling
* **Docker / Podman** for containerized workloads
* **NVIDIA NCCL** for multi-GPU communication
* **DeepSpeed, Megatron, FSDP** for distributed training frameworks

<Note>
  Runcrate can assist with cluster setup and configuration. Managed Kubernetes and Slurm options are available.
</Note>