Ovi is Now Available on fal

Team fal

Oct 5, 2025 • 6 min read

We’re excited to introduce Ovi, now available on fal on day 0. Built by Character AI, Ovi is the first open-source video model with native audio generation, delivering synchronized picture and sound from a single prompt with no post-production required. To help you get started, use coupon code Ovi on your fal Billing page to receive $20 in free generations.

Ovi supports both text-to-video and image-to-video, and introduces a simple, structured way to describe dialogue, sound design, and visuals together, so your scenes look and sound the way you imagine.

Understanding Ovi’s Core Strengths

Human-Centric Performances

Ovi excels at portraying people as expressive, emotionally grounded characters. Subtle cues like facial tension, eye movement, and posture unfold in sync with tone and dialogue, creating performances that feel deliberate and human. Thanks to Ovi’s structured prompting system, creators can control exactly when a character speaks, how expressions shift before, during, and after each line, and even coordinate multiple speakers in sequence. This level of temporal and emotional precision makes it possible to direct scenes with cinematic intent where every pause, glance, and gesture lands exactly where you want it.

0:00

/0:05

A man with dark hair, wearing a dark top, is shown in a close-up, illuminated by an intense, deep red and pink light that casts a strong hue over the entire scene. His head is slightly tilted down, and his eyes are initially looking downwards. In the blurred background, another figure, possibly a child, is faintly visible to the left. The man's eyes briefly close, then open, and he looks slightly to his right, his expression appearing contemplative. He then articulates, <S>Who the hell is Mark?<E> His gaze remains fixed as he asks the question, his brow slightly furrowed.. <AUDCAP>A continuous low-frequency hum, a very faint, almost inaudible high-pitched tone just before the dialogue, a man's voice.<ENDAUDCAP>

0:00

/0:05

A Black man with a short beard and dark hair stands center stage, illuminated by stage lights against a rippled royal blue curtain backdrop. He wears a white denim jacket over a mustard yellow t-shirt and holds a silver microphone in his right hand. He begins speaking, looking towards his left with a slight frown, <S>to<E> <S>cold like a month ago<E>. He then turns his gaze slightly right, his expression becoming more animated, his mouth open as if exclaiming. He shifts his weight, gesturing subtly with his microphone hand while continuing to speak, his head nodding slightly. He turns back to the left, his eyes wide and mouth open again, before stating with a direct look, <S>It's cold now, okay?<E> He then looks to his right, and his mouth briefly opens as if to say more, <S>I could<E>.. <AUDCAP>Male speaking voice, sound of audience laughter.<ENDAUDCAP>

Multi-Speaker Dialogue

Ovi makes multi-character scenes feel natural and conversational, not stitched together. Its architecture synchronizes timing, lip motion, and vocal tone across multiple speakers, so exchanges flow with believable rhythm and pacing. Each voice maintains its own tonality and emotional delivery, while expressions and gestures update seamlessly as the dialogue unfolds. The structured prompt format lets you define clear turns in conversation, who speaks, when they react, and how their tone shifts, giving creators full control over interaction timing without losing spontaneity. The result is dialogue that feels alive: characters listening, interrupting, and responding as if they truly share the same space.

0:00

/0:05

A zoomed in close-up shot of a man in a dark apron standing behind a cafe counter, leaning slightly on the polished surface. Across from him in the same frame, a woman in a beige coat holds a paper cup with both hands, her expression playful. The woman says <S>You always give me extra foam.<E> The man smirks, tilting his head toward the cup. The man says <S>That’s how I bribe loyal customers.<E> Warm cafe lights reflect softly on the counter between them as the background remains blurred. <AUDCAP>Female and male voices speaking English casually, faint hiss of a milk steamer, cups clinking, low background chatter.<ENDAUDCAP>

Sound Effects & Music

The model natively produces environmental ambience, sound effects, and music in sync with the visuals, so every footstep, breath, or gust of wind lands exactly when it should. This fusion of audio and video makes scenes feel cohesive and cinematic right out of the model with no editing or layering required. Whether you describe subtle details like “raindrops on metal” or broader cues such as “melancholic piano score,” Ovi interprets and times them naturally within the scene. The result is immersive storytelling where atmosphere, motion, and emotion work together, allowing creators to shape not just how their videos look, but how they feel.

0:00

/0:05

Close-up shot of a pianist’s hands moving gracefully across the keys of a grand piano. Warm, golden stage lighting reflects softly on the polished surface of the instrument as the camera lingers on the fluid motion of fingers and the delicate press of each key. The background fades into a soft blur, focusing attention on precision and emotion in every touch. <AUDCAP>gentle microphone perspective; rich acoustic piano sound in a classical tone; faint resonance of the hall; subtle mechanical noises from the keys and pedals; soft audience hush<ENDAUDCAP>

0:00

/0:05

A wide coastal shot in the morning. Waves crash rhythmically against dark jagged rocks, sending bursts of white foam into the air. The camera slowly pans along the shoreline as the morning light glows gold across the sea surface. A few seagulls circle overhead, gliding effortlessly against the wind. <AUDCAP> ambient ocean waves; splashes of water hitting rocks; distant seagulls calling; soft wind gusts; gentle echo of the surf <ENDAUDCAP>

Prompting Guide

Ovi uses a structured prompting system that gives you precise control over speech, ambient audio, and sound effects. By separating dialogue from general audio descriptions, you can define exactly what is heard, when it happens, and how it sounds, resulting in tighter synchronization and more expressive storytelling.

Speech Tags

Use <S> and <E> to mark the start and end of spoken dialogue.
Everything inside these tags will be converted into synchronized speech that matches lip motion and tone. You can include emotional or stylistic directions in parentheses to refine delivery, for instance:

Example: <S>(soft whisper) The storm is coming.<E>

These cues help Ovi interpret tone and pacing, resulting in more lifelike vocal performances.

Audio Description Tags

Use <AUDCAP> and <ENDAUDCAP> to describe background ambience, sound effects, or music. This section defines all non-spoken sounds, from subtle environmental layers to rich soundscapes.

Example: <AUDCAP>gentle rain hitting leaves; distant thunder; soft piano melody<ENDAUDCAP>

By describing the desired atmosphere, you guide Ovi to generate coherent, well-timed audio that complements the visuals.

Combining Both

You can combine both tag types in the same prompt to build a complete audiovisual scene. For instance, describe your setting visually, define the spoken line within <S> ...<E>, and then follow it with an <AUDCAP> ...<ENDAUDCAP> block for environmental audio.

Example: A lone wanderer stands on a cliff overlooking a stormy sea.
<S> (calm, steady voice) I’ve been waiting for this moment.<E>
<AUDCAP> wind whistling; crashing waves; distant seagulls; rolling thunder <ENDAUDCAP>

Tips for Best Results

Keep your visual description short and clear; Ovi uses it to set composition and movement.
Use specific audio descriptors (e.g., “soft metallic clink” instead of “metal sound”).
Avoid overloading the prompt with too many simultaneous sound sources—less is often more.
For multiple speakers, use separate <S> tags for each voice in sequence.
Combine natural ambience (wind, water, room tone) with expressive cues (music, tone shifts) for cinematic balance.

Getting Started with Ovi

The easiest way to explore Ovi's capabilities is through Fal's Playground, where you can experiment with prompts and see immediate results. A detailed guide on how to integrate Ovi into your platform is available in our API documentation. To help you get started, use coupon code Ovi on your fal Billing page to receive $20 in free generations.

Stay tuned to our Reddit, blog, Twitter, or Discord for the latest updates on generative media and the new model releases!