Documentation Index
Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
What is Runcrate?
What is Runcrate?
Runcrate is built around two products on a single account:
- Inference Engine — One OpenAI-compatible API for 140+ open-source models. Billed per token.
- Compute — On-demand GPU instances, persistent storage, and dedicated clusters. Containers, VMs, or bare-metal.
Who is Runcrate for?
Who is Runcrate for?
Runcrate is built for AI teams running real workloads:
- AI product teams building inference-heavy features on open-source models
- ML teams that need on-demand GPUs for training, fine-tuning, or evaluation
- AI companies that have outgrown aggregators and want predictable per-token pricing or reserved capacity
- Research labs that need bare-metal access without long-term commitments
What models are available?
What models are available?
140+ open-source models across 8 categories: Chat, Reasoning, Code, Vision, Image Generation, Video Generation, Text-to-Speech, and Speech-to-Text. Families include Llama, DeepSeek, Qwen, GLM, Kimi, Mistral, FLUX, and more. See the Model Catalog for the full list.
Do I need to install anything?
Do I need to install anything?
No. The Models API is accessible via HTTP requests from any language or framework. The dashboard is fully web-based. For GPU instances, you only need an SSH client (built into macOS, Linux, and Windows).
Is there a free tier?
Is there a free tier?
There is no free tier. Runcrate uses a prepaid credit system — you add credits and only pay for what you use. You can start with as little as $5.
How fast are deployments?
How fast are deployments?
- Models API — Instant. Make your first API call as soon as you have a key.
- GPU Instances — Typically 1 to 3 minutes from deployment to SSH access.
How can I get support?
How can I get support?
- Discord — Join our Discord community for real-time help
- Email — Contact us at support@runcrate.ai