Chatterbox Turbo is now available on fal

Team fal

Dec 15, 2025 • 3 min read

Chatterbox Turbo is an open-source, ultra-fast text-to-speech model built for real-time voice AI. It combines sub‑150 ms time to first sound with expressive paralinguistic prompting and instant voice cloning, so agents can speak naturally in the voices your users expect. Available day 0 on fal, so you can try it now and ship today .

Understanding Chatterbox Turbo’s core strengths

Paralinguistic prompting for human reactions
Add non-speech sounds directly in your script to convey emotion and pacing. The model performs reactions like [laugh], [sigh], and [chuckle] in the same cloned voice, so your agent can breathe, hesitate, and react like a person.
Example prompt: Alright, let me check that for you. [typing] Hm. [sigh] Looks like your subscription expired yesterday, but I can renew it now if you want.
Instant voice cloning from 5 seconds
Drop in a short reference clip and get a high‑fidelity clone that stays expressive. Chatterbox Turbo preserves timbre and style while supporting natural paralinguistic cues in the same voice.
Example prompt: Hey there. [chuckle] I pulled your latest status report. Want me to summarize the highlights?
6x faster than real time for live agents
The distilled single‑step inference and a streamlined 350M‑parameter architecture enable sub‑200 ms responses, even before extra optimizations. That is fast enough for live conversations, voice UIs, and on-device experiences.

What’s new under the hood

Single‑step inference: Distilled from multi‑step CFM to one step for a dramatic latency reduction.
Leaner architecture: Moving from a larger LLaMA backbone to a faster GPT‑2 backbone at 350M parameters for speed and cost gains.
Built‑in trust: Every output is watermarked with PerTh for verifiable AI audio that is inaudible to listeners.

Why Chatterbox Turbo over alternatives

Real-time feel: Consistent sub‑150 ms time to first sound helps you keep turn‑taking natural in live calls and chats.
Expressive control: Paralinguistic tags produce natural laughs, sighs, gasps, and more without extra post.
Zero-shot cloning: Generate a convincing voice from 5 seconds of audio.
Safety and provenance: PerTh watermarking for enterprise and regulatory needs.

Examples

Sample 1

Reference Audio

0:00

/15.105875

Prompt: Hey, [chuckle] sorry, I'm just so excited. I processed your entire calendar in point zero two seconds and... wow! You have so much going on... but I think you're tired... the new OS1 update lets me handle the boring stuff, so you can just... be you. Shall I download the update for us? Or we can do it together.

Output

0:00

/21.36

Sample 2

Reference Audio

0:00

/28.25

Prompt: Hello! Thanks for calling today. I'm Alex, your support agent. [chuckle] Let's take a look at what’s going on with your account. Don’t worry — we’ll sort this out together. [laugh] Go ahead and tell me what you’re experiencing, and I'll walk you through the fix step by step.

Output

0:00

/15.627755102040817

Paralinguistic tags available

[laugh], [chuckle], [sigh], [gasp], [cough], [clear throat], [sniff], [groan], [shush]

Prompting and input tips

Use tags sparingly: A single [chuckle] or [sigh] goes a long way; overuse sounds theatrical.
Place tags at natural boundaries: Before a clause or sentence where an emotion would occur in real speech.
Keep clones clean: Provide a reference voice with minimal background noise and strong diction.

Use cases

Voice agents and IVR: Natural turn‑taking for support, sales, and booking flows.
Creative narration: Character voices with light emotional cues for videos and podcasts.
Accessibility: Fast, clear TTS for screen readers and assistive tools that need immediate feedback.
Gaming and interactive experiences: Reactive NPCs that laugh, gasp, or hesitate in sync with gameplay.
On-device and private deployments: Open-source model suitable for controlled environments.

Getting started on fal

Try it in the Playground to audition voices, test tags, and tune pacing. This mirrors the typical fal flow for model launches where you can iterate visually before integrating the API .
Read the API docs to connect production workloads and manage audio outputs at scale. The docs link is part of the standard “Playground then API” path used across fal model posts .

Stay tuned to our Youtube, Reddit, blog, Twitter, or Discord for the latest updates on generative media and new model releases!

Understanding Chatterbox Turbo’s core strengths

What’s new under the hood

Why Chatterbox Turbo over alternatives

Examples

Paralinguistic tags available

Prompting and input tips

Use cases

Getting started on fal

Follow us