SERVERLESS INFERENCE

GPU inference, zero infrastructure.

Stop managing GPU instances, CUDA versions, model loading, and autoscaling. Runcrate's serverless inference runs 200+ AI models on dedicated hardware with per-token billing. No cold starts, no idle costs, no infrastructure to maintain. Send requests, get results, pay for what you use.

200+
Models
None
Cold starts
Chat, image, video, audio
Modalities

AVAILABLE GPUS

GPUs you can deploy today.

deepseek-ai/DeepSeek-V3
DeepSeekPer-token
128K context, MoE architecture
meta-llama/Llama-4-Scout-17B-16E-Instruct
MetaPer-token
17B MoE, 128K context
black-forest-labs/FLUX.1-dev
Black Forest LabsPer-image
12B, photorealistic images
openai/whisper-large-v3
OpenAIPer-minute
Speech-to-text, 100+ languages

WHY RUNCRATE

Built for production.

Zero Infrastructure

No GPU provisioning, no Docker containers, no autoscaling policies, no CUDA debugging. Send a request, get a result. Runcrate handles everything else.

No Cold Starts

Models are always warm and ready. First request is as fast as the thousandth. No waiting for model loading, container spin-up, or weight downloads.

Per-Usage Billing

Pay per token, per image, per second of audio, or per second of video. No idle GPU costs, no monthly minimums, no seat licenses. Credits never expire.

Multi-Modal

Chat, image generation, video generation, speech-to-text, text-to-speech, embeddings, and vision, all through one API and one billing account.

COMPARISON

Runcrate vs Self-Hosted GPU.

Setup time
Runcrate: < 60 seconds
Self-Hosted GPU: Hours to days
Cold starts
Runcrate: None
Self-Hosted GPU: Model loading time
Scaling
Runcrate: Automatic
Self-Hosted GPU: Manual autoscaling
Idle cost
Runcrate: $0
Self-Hosted GPU: Full GPU cost 24/7
Maintenance
Runcrate: Zero
Self-Hosted GPU: CUDA, drivers, monitoring

GET STARTED

Try it now.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

# Chat completion
chat = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3",
    messages=[{"role": "user", "content": "Hello, world!"}],
)

# Image generation
image = client.images.generate(
    model="black-forest-labs/FLUX.1-dev",
    prompt="A futuristic cityscape",
)

# Speech-to-text
transcript = client.audio.transcriptions.create(
    model="openai/whisper-large-v3",
    file=open("audio.mp3", "rb"),
)

FAQ

Common questions.

Start with serverless inference.