What audio formats are returned?

The default format is WAV. Some models support additional formats like MP3 and OGG. Check the model-specific documentation for details.

Can I use TTS for real-time applications?

Yes. Streaming is supported so audio playback can begin immediately. Zonos and Kokoro are optimized for low-latency use cases.

How long can the input text be?

Input length limits vary by model. Most support up to several thousand characters per request. For very long texts, split into paragraphs.

runcrate

Contact Sales Console

TEXT TO SPEECH API

Natural speech from any text.

Generate human-quality audio from text using models like Qwen3-TTS, Kokoro, Orpheus, HiggsAudio, Chatterbox, and Zonos. Multiple voices, emotional control, and multilingual output. OpenAI-compatible TTS endpoint with streaming support.

Models

/v1/audio/speech

Endpoint

Supported

Streaming

Get API Key View Pricing

QUICK START

Integrate in minutes.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

response = client.audio.speech.create(
    model="Qwen/Qwen3-TTS",
    input="Welcome to Runcrate. Your deployment is ready.",
    voice="alloy",
)
# response streams audio data
with open("output.wav", "wb") as f:
    f.write(response.content)

AVAILABLE MODELS

Models you can use today.

Model	Provider	Price	Detail
Qwen/Qwen3-TTS	Alibaba	Per-token	Multilingual, natural prosody
kokoro	Kokoro	Per-token	Fast, expressive voice synthesis
orpheus	Orpheus	Per-token	Emotional control, multiple styles
higgs-audio	HiggsAudio	Per-token	High-fidelity audio output
chatterbox	Chatterbox	Per-token	Conversational TTS
zonos	Zonos	Per-token	Low-latency streaming TTS

Qwen/Qwen3-TTS

AlibabaPer-token

Multilingual, natural prosody

kokoro

KokoroPer-token

Fast, expressive voice synthesis

orpheus

OrpheusPer-token

Emotional control, multiple styles

higgs-audio

HiggsAudioPer-token

High-fidelity audio output

chatterbox

ChatterboxPer-token

Conversational TTS

zonos

ZonosPer-token

Low-latency streaming TTS

WHY RUNCRATE

Built for production.

Multiple Voice Models

Choose from six different TTS engines. Each has different strengths: natural prosody, emotional range, speed, or multilingual coverage.

Streaming Audio

Start playback before generation completes. Stream audio chunks to your client for minimal time-to-first-byte.

OpenAI-Compatible

Uses the standard /v1/audio/speech endpoint. If your app works with OpenAI TTS, it works here with a base URL change.

Voice Customization

Select from preset voices or adjust parameters for speed, pitch, and emotional tone depending on the model.

FAQ

Common questions.

Start generating speech today.

Get API Key View Pricing