Wan 2.5 Preview is now available on fal

Wan 2.5 Preview is now available on fal

We are excited to announce the release of Wan 2.5 now available from day 0 on fal. This latest model in the Wan family introduces new capabilities for both text-to-video and image-to-video generation. For the first time ever, Wan videos come alive with native audio generation, opening the door to richer, more immersive storytelling straight out of the box.

What's new on Wan 2.5

Native Audio Generation - Dialogues

Wan 2.5 introduces native audio generation, allowing creators to produce videos with synchronized sound directly alongside visuals. This integration streamlines creative workflows by eliminating the need for separate tools and post-production steps.

0:00
/0:05

A cozy, warmly lit coffee shop interior in the late morning. Sunlight filters through tall windows, casting golden rays across wooden tables and shelves lined with mugs and bags of beans. A young woman in casual clothes steps up to the counter, her posture relaxed but purposeful. Behind the counter, a friendly barista in an apron stands ready, with the soft hiss of the espresso machine punctuating the atmosphere. Other customers chat quietly in the background, their voices blending into a gentle ambient hum. The mood is inviting and everyday-realistic, grounded in natural detail. Woman: “Hi, I’ll have a cappuccino, please.” Barista (nodding as he rings it up): “Of course. That’ll be five dollars.”

With this capability, scenes can now include dialogues that match the visual context. Whether it’s a character delivering lines in a dramatic sequence, the subtle sounds of a scene's environment, or the atmospheric score of a short film, audio and video are generated together as a cohesive whole.

0:00
/0:05

A bustling restaurant kitchen glows under warm overhead lights, filled with the rhythmic clatter of pots, knives, and sizzling pans. In the center, a chef in a crisp white uniform and apron stands over a hot skillet. He lays a thick cut of steak onto the pan, and immediately it begins to sizzle loudly, sending up curls of steam and the rich aroma of searing meat. Beads of oil glisten and pop around the edges as the chef expertly flips the steak with tongs, revealing a perfectly caramelized crust. The camera captures close-up shots of the steak searing, the chef’s focused expression, and wide shots of the lively kitchen bustling behind him. The mood is intense yet precise, showcasing the artistry and energy of fine dining.

This advancement expands the expressive range of Wan outputs, enabling richer, more immersive storytelling across filmmaking, advertising and content creation.

Native Audio Generation - Background Audio

In addition to dialogue, Wan 2.5 excels at producing ambient sound and background audio that enhance the mood and realism of a scene. Subtle details such as the rustle of leaves in a park, distant city sounds, or the the roaring sound of an F1 engine add texture and authenticity to video generations.

0:00
/0:05

A person jogs steadily through a lush, green park on a bright morning. The camera tracks them from the side capturing the rhythmic motion of their strides and the subtle bounce of their arms. Sunlight filters through the canopy of tall trees, casting shifting patterns of light and shadow across the winding path. Around them, the park is alive with detail—birds chirping, distant voices of children playing, the rustle of leaves in the breeze. The soundscape is natural and immersive, filled with ambient park sounds: footsteps on gravel, the faint hum of cicadas, and the occasional bark of a dog in the distance. The atmosphere is peaceful yet full of energy, evoking the serenity and vitality of a morning run.

By unifying visuals with contextual sound, Wan 2.5 delivers more immersive and emotionally resonant outputs, supporting a wide range of creative applications from film prototyping to advertising and content production.

0:00
/0:05

A cinematic tracking shot of a Ferrari Formula 1 car racing through the iconic Monaco Grand Prix circuit. The camera is fixed on the side of the car that is moving at high speed, capturing the sleek red bodywork glistening under the Mediterranean sun. The reflections of luxury yachts and waterfront buildings shimmer off its polished surface as it roars past. Crowds cheer from balconies and grandstands, while the blur of barriers and trackside advertisements emphasizes the car’s velocity. The sound design should highlight the high-pitched scream of the F1 engine, echoing against the tight urban walls. The atmosphere is glamorous, fast-paced, and intense, showcasing the thrill of racing in Monaco.

Strong Prompt Adherence

Wan 2.5 demonstrates substantially improved prompt adherence compared to Wan 2.2, enabling outputs that more faithfully reflect detailed creative instructions. This advancement allows creators to design more complicated scenes, combining audio, camera movements, and stylistic direction while maintaining coherence and accuracy. The model captures both broad concepts and subtle nuances with greater accuracy.

0:00
/0:05

A sleek blue Lamborghini speeds through a long tunnel at golden hour. Sunlight beams directly into the camera as the car approaches the tunnel exit, creating dramatic lens flares and warm highlights across the glossy paint. The camera begins locked in a steady side view of the car, holding the composition as it races forward. As the Lamborghini nears the end of the tunnel, the camera smoothly pulls back, revealing the tunnel opening ahead as golden light floods the frame. The atmosphere is cinematic and dynamic, emphasizing speed, elegance, and the interplay of light and motion.

Enhanced Style Adaptation

Wan 2.5 offers greater flexibility in adapting to a wide range of visual styles, from photorealistic cinematic shots to highly stylized aesthetics such as anime and illustration. The model responds more accurately to detailed stylistic instructions, preserving character consistency and scene composition across different artistic directions.

0:00
/0:05

Japanese anime style with a cyberpunk aesthetic. A lone figure in a hooded jacket stands on a rain-soaked street at night, neon signs flickering in pink, blue, and green above. The camera tracks slowly from behind as the character walks forward, puddles rippling beneath their boots, reflecting glowing holograms and towering skyscrapers. Crowds of shadowy figures move along the sidewalks, illuminated by shifting holographic billboards. Drones buzz overhead, their red lights cutting through the mist. The atmosphere is moody and futuristic, with a pulsing synthwave soundtrack feel. The art style is detailed and cinematic, with glowing highlights, sharp contrasts, and dramatic framing straight out of a cyberpunk anime film.

Getting Started with Wan 2.5

The easiest way to explore Wan 2.5's capabilities is through Fal's Playground, where you can experiment with prompts and see immediate results. A detailed guide on how to integrate Wan 2.5 into your platform is available in our API documentation.

Prompting Guide

To achieve the best results with Wan 2.5, it is essential to provide detailed and structured prompts. The model responds accurately to guidance on both visual and auditory elements, allowing you to shape outputs with cinematic precision.

Key recommendations:

  • Specify dialogue clearly
    • Write the exact words to be spoken.
    • Indicate who speaks and in what order to maintain clarity in multi-character scenes.
    • Example: “Character A: ‘We have to keep moving.’ Character B: ‘Not until we find shelter.’”
  • Control when no dialogue is needed
    • In scenarios where dialogue should be absent, explicitly state “dialogue” "actors speaking" in the negative prompt. This ensures the model does not introduce unintended speech.
  • Define ambient and background audio
    • Describe the type of environmental sound or music required.
    • Example: “soft rain tapping on windows with distant thunder” or “fast-paced action music with heavy percussion.”
  • Describe scene elements in detail
    • The more descriptive you are with setting, lighting, mood, and camera work, the more immersive and accurate the generation will be.
    • Example: “A wide shot of a mountain road at sunset, warm golden light across the sky, a cyclist racing downhill, accompanied by energetic background music.”

By layering these elements together, creators can generate outputs that feel intentional and professional.


Stay tuned to our blogTwitter, or Discord for the latest updates on generative media and the new model releases!