WHISPER API
Run OpenAI's Whisper Large V3 and Whisper V3 Turbo without managing GPUs. Send audio, get transcripts. The API is OpenAI-compatible, so your existing code works with a one-line change. Turbo is 8x faster at nearly identical accuracy for latency-sensitive workloads.

QUICK START
from openai import OpenAI
client = OpenAI(
base_url="https://api.runcrate.ai/v1",
api_key="rc_live_YOUR_API_KEY",
)
# Use Turbo for faster results
transcript = client.audio.transcriptions.create(
model="openai/whisper-large-v3-turbo",
file=open("podcast.mp3", "rb"),
)
print(transcript.text)AVAILABLE MODELS
| Model | Provider | Price | Detail |
|---|---|---|---|
| openai/whisper-large-v3 | OpenAI | $0.045/min | Highest accuracy, best for critical transcription |
| openai/whisper-large-v3-turbo | OpenAI | $0.02/min | 8x faster inference, ideal for real-time |
WHY RUNCRATE
Skip the CUDA setup, model loading, and GPU provisioning. Send a file, receive text. Runcrate handles the infrastructure.
Whisper V3 Turbo delivers results 8x faster than the full model with near-identical word error rates. Pay less, get results faster.
Whisper auto-detects the spoken language from 100+ options. No need to specify the language parameter for most use cases.
Handles concurrent requests, retries transparently, and scales automatically. No cold starts, no queue management.
FAQ