Inworld TTS-1.5 Max Now Available on fal
We’re excited to add Inworld TTS-1.5 Max to fal, expanding our set of cutting-edge real-time voice models on the platform. The model focuses on low-latency speech generation, improved expressiveness, and multilingual support for production use cases.
As voice becomes a core interface across applications, from assistants to media experiences, developers need models that balance latency, quality, and cost. TTS-1.5 Max is designed to operate within these constraints while supporting real-time interactions.
What is Inworld TTS-1.5 Max?
Inworld TTS-1.5 Max is a text-to-speech model built for expressive, low-latency voice synthesis. It is part of the TTS-1.5 family, which includes both Max (higher quality) and Mini (lower latency) variants.
The Max model is positioned as the default option for most applications, prioritizing voice quality and expressive range while maintaining near-realtime responsiveness.
Key characteristics
Realtime latency
TTS-1.5 Max achieves time-to-first-audio under ~250ms (P90), enabling conversational and interactive use cases where response time impacts user experience.
Improved expressiveness and accuracy
Compared to earlier versions, the model introduces higher expressive range and lower word error rates. This reduces artifacts such as mispronunciations, cutoffs, and unnatural pacing.
Multilingual support
The model supports 15 languages, including expanded coverage for global applications and use cases like localization and translation.
Cost profile
Pricing is structured at approximately $0.01 per minute ($10 per million characters), positioning it as a lower-cost option relative to many comparable realtime TTS systems.
Try it on fal
You can start using Inworld TTS-1.5 Max on fal to generate expressive speech, test latency-performance tradeoffs, and integrate voice into your applications.
Stay tuned to our X, blog or Reddit for the latest updates on generative media and new model releases!