AI Voice Cloning API

Clone a voice from a short reference audio sample and generate new speech in that voice. Useful for personalized TTS, localization, audiobook narration, and character voices.

Available models

Model	Languages	Strengths
HiggsAudio V2.5	20+	Highest fidelity, emotion preservation
Zonos v0.1	10+	Fast inference, real-time apps
Chatterbox Multilingual	30+	Widest language coverage, cross-lingual

Basic voice cloning

Provide a 10-30 second audio sample of the target voice:

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")
audio = client.models.text_to_speech(
    model="bosonai/HiggsAudioV2.5",
    input="Welcome to our quarterly earnings call. Strong growth across all segments.",
    reference_audio="./ceo-sample.mp3",
)
with open("cloned-speech.mp3", "wb") as f:
    f.write(audio)

reference_audio accepts a file path (auto base64-encoded), a URL, or raw base64 data.

All examples below reuse the same client.

Cross-lingual cloning

audio = client.models.text_to_speech(
    model="ResembleAI/chatterbox-multilingual",
    input="Bienvenidos a nuestra presentacion trimestral.",
    reference_audio="./english-speaker.mp3", language="es",
)
with open("spanish-clone.mp3", "wb") as f:
    f.write(audio)

Real-time cloning (Zonos)

audio = client.models.text_to_speech(
    model="Zyphra/Zonos-v0.1-transformer",
    input="Your order has been confirmed and will arrive by Thursday.",
    reference_audio="./brand-voice.mp3",
)
with open("notification.mp3", "wb") as f:
    f.write(audio)

Batch narration

chapters = [
    {"file": "ch01.mp3", "text": "Chapter one. The morning light crept through the curtains."},
    {"file": "ch02.mp3", "text": "Chapter two. The letter arrived on a Tuesday."},
    {"file": "ch03.mp3", "text": "Chapter three. Three weeks had passed since the call."},
]
for ch in chapters:
    audio = client.models.text_to_speech(
        model="bosonai/HiggsAudioV2.5", input=ch["text"], reference_audio="./narrator.mp3",
    )
    with open(ch["file"], "wb") as f:
        f.write(audio)

Tips

Reference quality matters. Clean recording, minimal noise, 10-30s of clear speech.
HiggsAudio for fidelity. When the clone must be indistinguishable from the original.
Chatterbox for languages. 30+ languages, cross-lingual cloning.

​Available models

​Basic voice cloning

​Cross-lingual cloning

​Real-time cloning (Zonos)

​Batch narration

​Tips

Available models

Basic voice cloning

Cross-lingual cloning

Real-time cloning (Zonos)

Batch narration

Tips