SERVERLESS INFERENCE
Stop managing GPU instances, CUDA versions, model loading, and autoscaling. Runcrate's serverless inference runs 200+ AI models on dedicated hardware with per-token billing. No cold starts, no idle costs, no infrastructure to maintain. Send requests, get results, pay for what you use.
AVAILABLE GPUS
| Model | Provider | Price | Detail |
|---|---|---|---|
| deepseek-ai/DeepSeek-V3 | DeepSeek | Per-token | 128K context, MoE architecture |
| meta-llama/Llama-4-Scout-17B-16E-Instruct | Meta | Per-token | 17B MoE, 128K context |
| black-forest-labs/FLUX.1-dev | Black Forest Labs | Per-image | 12B, photorealistic images |
| openai/whisper-large-v3 | OpenAI | Per-minute | Speech-to-text, 100+ languages |
WHY RUNCRATE
No GPU provisioning, no Docker containers, no autoscaling policies, no CUDA debugging. Send a request, get a result. Runcrate handles everything else.
Models are always warm and ready. First request is as fast as the thousandth. No waiting for model loading, container spin-up, or weight downloads.
Pay per token, per image, per second of audio, or per second of video. No idle GPU costs, no monthly minimums, no seat licenses. Credits never expire.
Chat, image generation, video generation, speech-to-text, text-to-speech, embeddings, and vision, all through one API and one billing account.
COMPARISON
| Feature | Runcrate | Self-Hosted GPU |
|---|---|---|
| Setup time | < 60 seconds | Hours to days |
| Cold starts | None | Model loading time |
| Scaling | Automatic | Manual autoscaling |
| Idle cost | $0 | Full GPU cost 24/7 |
| Maintenance | Zero | CUDA, drivers, monitoring |
GET STARTED
from openai import OpenAI
client = OpenAI(
base_url="https://api.runcrate.ai/v1",
api_key="rc_live_YOUR_API_KEY",
)
# Chat completion
chat = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3",
messages=[{"role": "user", "content": "Hello, world!"}],
)
# Image generation
image = client.images.generate(
model="black-forest-labs/FLUX.1-dev",
prompt="A futuristic cityscape",
)
# Speech-to-text
transcript = client.audio.transcriptions.create(
model="openai/whisper-large-v3",
file=open("audio.mp3", "rb"),
)FAQ