VEO 3.1 is now available on fal

We are pleased to announce the release of Veo 3.1, now available on fal from day 0. This latest evolution of Google DeepMind’s Veo model series brings a new level of cinematic control to generation, combining expressive visuals with native audio generation for synchronized, sound-integrated storytelling straight out of the model.
Veo 3.1 introduces frame interpolation, allowing users to define and transition smoothly between the first and last frames of a video, as well as reference-image conditioning, which enables creators to guide visual style, character consistency, and scene composition using image inputs. Together, these capabilities unlock new use cases that rely on precise scene control, consistent characters and brand aligned aesthetics.
All generation modes, text-to-video, image-to-video, and reference-to-video, are available through the standard Veo 3.1 endpoints and their high-speed counterparts on Veo 3.1 Fast, allowing creators to balance fidelity and iteration speed according to their workflow.
Understanding Veo 3.1’s New Capabilities
Frame interpolation
Veo 3.1 introduces a major advancement in scene control through frame interpolation, allowing creators to define both the starting and ending frames of a video. This feature enables natural, cinematic transitions from subtle pans to complex scene evolutions with smooth, temporally consistent motion.
In this example, the first frame was set to an image of a gamer sitting at their setup while the last frame was defined as a screenshot from Call of Duty gameplay. Veo 3.1 interpolates between these two moments, creating a seamless transition that moves fluidly from the real-world environment into the in-game action. The result feels deliberate and cinematic, as if the camera itself is crossing the boundary between reality and the virtual world.
a confident gamer sitting at their gaming setup with colorful LED lights in the background. The gamer looks excited and speaks directly to the camera, saying enthusiastically: “Today we’ll be playing Call of Duty zombies! his tone is energetic and engaging, like a live streamer about to begin a gaming session. Then the camera cuts to the gameplay
Reference-Image Conditioning
Veo 3.1 expands creative control through reference-image conditioning, a feature that allows you to upload multiple images that act as scene ingredients. Each reference contributes visual cues such as character design, color palette, lighting, or setting, which the model blends together into a cohesive moving scene. This gives creators the ability to maintain stylistic consistency, merge concepts, or evolve ideas fluidly across different visual domains.
In this example, a ballerina, an open field, and a circus tent are provided as reference images. Veo 3.1 interprets each element as part of a single creative vision, integrating all 3 elements combined in the same scene.
Improvements from Veo 3
Improved Realism
Veo 3.1 elevates cinematic realism to a new standard, capturing human performance and emotional nuance with striking authenticity. Actors generated by the model exhibit believable expressions, subtle eye movements, and natural body language that align seamlessly with spoken dialogue. Conversations feel grounded; pauses, gestures, and tone shifts occur with the rhythm of real interaction, creating scenes that play out like live-action footage rather than synthetic animation.
Lighting, camera work, and framing have also been refined to mirror professional cinematography. Depth of field behaves naturally, facial highlights respond to environmental light, and shot composition maintains visual coherence across cuts. Whether rendering a close-up monologue or a multi-character exchange, Veo 3.1 delivers performances that feel intentional, emotionally resonant, and cinematic in every frame.
Improved Audio Generation
With native audio generation Veo 3.1 produces soundscapes that feel organic and emotionally in sync with the visuals. The model captures ambient tone, dialogue pacing, and spatial cues and music with a precision that enhances immersion.
A cinematic shot of a talented violinist performing passionately on stage in a grand opera theater filled with warm golden light. The camera slowly circles around as the violinist plays under a single spotlight, with the orchestra faintly visible behind them and an elegant audience watching in silence. Capture the emotion and intensity in their face, the graceful motion of the bow, and the deep resonance of the moment. The atmosphere should feel dramatic, refined, and immersive — like the climax of a world-class performance film.
Advanced Word Understanding & Physics
Veo 3.1 also demonstrates stronger semantic comprehension and physical accuracy, enabling scenes that make narrative and spatial sense. The model interprets complex prompts — including multi-step actions, abstract descriptions, or nested relationships — with improved consistency and coherence. Motion now aligns naturally with real-world physics: objects accelerate and collide with realistic inertia, gravity influences how materials behave, and camera movement feels stable and deliberate.
In the example below, a metal ball drops down a metallophone, striking each bar in sequence. The generated sound aligns precisely with the notes being played, while reflections from the colorful bars ripple across the polished surface of the ball. Every element — from motion to sound to light — interacts believably, showcasing Veo 3.1’s ability to unify physical simulation and audiovisual realism within a single generation.
A beautifully shot, high-detail cinematic scene of a small silver ball rolling down a vertical wall covered with colorful xylophone bars arranged like a cascading path. As the ball bounces from bar to bar, each impact produces a bright, melodic note, creating a playful musical sequence. The camera smoothly follows the ball’s motion in slow motion, capturing reflections, depth of field, and rich acoustic ambiance. The lighting is soft and warm, highlighting the glossy metal of the ball and the vibrant colors of the xylophones. The mood is whimsical, precise, and musically satisfying — like a blend of art installation and physics experiment.
Getting Started with Veo 3.1
The easiest way to explore Veo 3.1's capabilities is through Fal's Playground, where you can experiment with prompts and see immediate results. A detailed guide on how to integrate Veo 3.1 into your platform is available in our API documentation.
Stay tuned to our Reddit, blog, Twitter, or Discord for the latest updates on generative media and the new model releases!