WHISPER API

Whisper inference, no GPU required.

Run OpenAI's Whisper Large V3 and Whisper V3 Turbo without managing GPUs. Send audio, get transcripts. The API is OpenAI-compatible, so your existing code works with a one-line change. Turbo is 8x faster at nearly identical accuracy for latency-sensitive workloads.

$0.02/min
Turbo price
100+
Languages
25MB
Max file size

QUICK START

Integrate in minutes.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

# Use Turbo for faster results
transcript = client.audio.transcriptions.create(
    model="openai/whisper-large-v3-turbo",
    file=open("podcast.mp3", "rb"),
)
print(transcript.text)

AVAILABLE MODELS

Models you can use today.

openai/whisper-large-v3
OpenAI$0.045/min
Highest accuracy, best for critical transcription
openai/whisper-large-v3-turbo
OpenAI$0.02/min
8x faster inference, ideal for real-time

WHY RUNCRATE

Built for production.

No GPU Management

Skip the CUDA setup, model loading, and GPU provisioning. Send a file, receive text. Runcrate handles the infrastructure.

Turbo Mode

Whisper V3 Turbo delivers results 8x faster than the full model with near-identical word error rates. Pay less, get results faster.

Automatic Language Detection

Whisper auto-detects the spoken language from 100+ options. No need to specify the language parameter for most use cases.

Production Ready

Handles concurrent requests, retries transparently, and scales automatically. No cold starts, no queue management.

FAQ

Common questions.

Start transcribing with Whisper.