Skip to main content

Text-to-Speech

Generate speech audio from text input using TTS models.

Endpoint

POST https://api.runcrate.ai/v1/audio/speech

Basic Usage

curl https://api.runcrate.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer rc_live_YOUR_API_KEY" \
  --output speech.mp3 \
  -d '{
    "model": "hexgrad/Kokoro-82M",
    "input": "Hello, welcome to Runcrate!",
    "voice": "af_heart"
  }'

Parameters

ParameterTypeDescription
modelstringModel ID (required)
inputstringText to synthesize (required)
voicestringVoice preset name
response_formatstringOutput format (mp3, pcm)

Available Models & Voices

Kokoro 82M

Lightweight, fast TTS with natural-sounding voices. Voices: af_heart, af_bella, af_nicole, af_sky, am_adam, am_michael, bf_emma, bf_isabella, bm_george, bm_lewis

Orpheus 3B

High-quality expressive speech synthesis. Voices: tara, leah, jess, leo, dan, mia, zac, zoe

Qwen3-TTS

Multilingual TTS from Alibaba. Voices: Vivian, Serena, Uncle_Fu, Dylan, Eric, Ryan, Aiden, Ono_Anna, Sohee

Voice-Clone Models

Some models (HiggsAudio, Zonos, Chatterbox) support voice cloning — they don’t have preset voices but can clone from reference audio.

Response

The response body is raw audio binary (MP3 or PCM). Save it directly to a file or stream it to an audio player.