TEXT TO SPEECH API
Generate human-quality audio from text using models like Qwen3-TTS, Kokoro, Orpheus, HiggsAudio, Chatterbox, and Zonos. Multiple voices, emotional control, and multilingual output. OpenAI-compatible TTS endpoint with streaming support.

QUICK START
from openai import OpenAI
client = OpenAI(
base_url="https://api.runcrate.ai/v1",
api_key="rc_live_YOUR_API_KEY",
)
response = client.audio.speech.create(
model="Qwen/Qwen3-TTS",
input="Welcome to Runcrate. Your deployment is ready.",
voice="alloy",
)
# response streams audio data
with open("output.wav", "wb") as f:
f.write(response.content)AVAILABLE MODELS
| Model | Provider | Price | Detail |
|---|---|---|---|
| Qwen/Qwen3-TTS | Alibaba | Per-token | Multilingual, natural prosody |
| kokoro | Kokoro | Per-token | Fast, expressive voice synthesis |
| orpheus | Orpheus | Per-token | Emotional control, multiple styles |
| higgs-audio | HiggsAudio | Per-token | High-fidelity audio output |
| chatterbox | Chatterbox | Per-token | Conversational TTS |
| zonos | Zonos | Per-token | Low-latency streaming TTS |
WHY RUNCRATE
Choose from six different TTS engines. Each has different strengths: natural prosody, emotional range, speed, or multilingual coverage.
Start playback before generation completes. Stream audio chunks to your client for minimal time-to-first-byte.
Uses the standard /v1/audio/speech endpoint. If your app works with OpenAI TTS, it works here with a base URL change.
Select from preset voices or adjust parameters for speed, pitch, and emotional tone depending on the model.
FAQ