Kling 3.0 Prompting Guide

Kling 3.0 Prompting Guide

Kling 3.0 is designed to understand cinematic intent, not just visual descriptions. The model performs best when prompts are written like directions to a scene rather than a list of objects. Clear structure, explicit motion, and intentional shot language all lead to noticeably better results. For developers building with this new model, the Kling 3.0 API is available exclusively on fal.

Below are the key principles to follow when prompting Kling 3.0.

Think in Shots, Not Clips

One of the biggest upgrades in Kling 3.0 delivers native multi-shot generation, supporting storyboards of up to six shots in a single output. When writing prompts, it’s best to explicitly describe each shot as part of a sequence rather than trying to compress everything into one paragraph.

For multi-shot prompts, clearly label shots and describe each one’s framing, subject, and motion. Kling understands cinematic language such as profile shots, macro close-ups, tracking shots, POV, and shot-reverse-shot dialogue. This allows the model to automatically vary camera angles and compositions while maintaining narrative continuity.

Well-structured shot prompts result in smoother transitions, better coverage, and a more intentional cinematic flow.


Anchor Your Subjects Early for Consistency

Kling 3.0 introduces significantly stronger element and subject consistency, especially for characters. To take advantage of this, define your core subjects clearly at the beginning of the prompt and keep descriptions consistent across shots.

Whether you’re working from text alone, reference images, or image-to-video, the model can lock in key traits of characters, objects, and environments. Once established, these elements remain stable even as the camera moves or scenes evolve, which is especially important for multi-character and multi-shot narratives.

0:00
/0:05

Prompt

A dim kitchen late at night.
Only the refrigerator hum fills the silence.

A plate is set down too hard.
Ceramic clinks sharply.

[Character A: Exhausted Partner, trembling frustrated voice]: “You never listen to me.”

Immediately, the other partner turns around, eyes wide.
[Character B: Defensive Partner, shouting loudly]: “Because you never stop blaming!”

The exhausted partner exhales shakily.
[Exhausted Partner, voice cracking]: “I’m not blaming… I’m begging.”

Silence.

The defensive partner sighs heavily.
[Defensive Partner, softly, regretful]: “I don’t know how to fix this.”

A sad piano chord enters quietly.

As seen in the prompt above, establishing the characters early helps the model understand what parts of the dialogue are spoken by each actor. Adding the bold tone descriptions helps the model deliver realistic dialogues.

Describe Motion Explicitly

Kling 3.0 responds extremely well to explicit motion instructions. This includes both subject movement and camera behavior.

Instead of vague phrasing, describe how the camera behaves over time: tracking, following, freezing, panning, or moving in sync with the subject. Long takes work particularly well when the camera’s relationship to the subject is clearly defined, such as staying in a medium shot, freezing when the subject pauses, or resuming movement smoothly.

Clear motion descriptions lead to fewer artifacts, smoother pacing, and more realistic scene progression, especially in fast-paced or continuous shots.

Use Native Audio Intentionally

Kling 3.0 supports native audio output, including dialogue, ambient sound, and voice tone control. When native audio is enabled, prompts should explicitly indicate who is speaking and when, especially in multi-character scenes.

The model can now precisely reference characters during dialogue, eliminating ambiguity about who is talking. It also supports multiple languages, dialects, accents, and even multilingual code-switching within the same scene. When used well, lip movement, facial expression, and voice timing remain coherent and natural.

0:00
/0:08

Prompt

A sleek modern interrogation room with cold LED lighting.
Muted gray walls, a glass window, security cameras blinking red.
Low atmospheric suspense music hums with deep bass drones.

A detective in a navy suit leans forward slowly.
His hands rest calmly on the table.

[Character A: Lead Detective, controlled serious voice]: “Let’s stop pretending.”

Immediately, the suspect shifts in their chair, tense.
[Character B: Prime Suspect, sharp defensive voice]: “I already told you everything.”

The detective slides a folder across the table.
Paper scraping sound.

[Lead Detective, calm but threatening tone]: “Then explain why your fingerprints are here.”

The suspect’s breathing quickens.

[Prime Suspect, voice trembling]: “That’s impossible…”

The detective stands suddenly, chair scraping back.
Music tightens with a rising pulse.

As seen in the prompt example above, it is essential to establish the characters and the sequence of the dialogue clearly. Adding keywords that describe the pace and tonality of each line makes improves the output quality. Kling 3.0 understands cinematic language remarkably well. Prompts that reference filmmaking concepts such as scene coverage, composition, pacing, and continuity consistently outperform prompts that focus only on visual attributes.

This is especially true for dialogue scenes, long takes, and narrative sequences. Instead of listing visual traits, describe what the audience is meant to see and feel as the scene unfolds.

The model is optimized to translate this kind of intent into expressive acting, realistic gestures, and dynamic performances.

Take Advantage of Longer Durations

With support for flexible output up to 15 seconds, Kling 3.0 allows for real narrative development inside a single generation. This makes it possible to stage longer actions, multi-beat performances, or evolving scenes without cutting between generations.

Longer durations work best when prompts describe progression over time, how actions unfold, how the camera reacts, and how scenes transition. This is where Kling 3.0 truly separates itself from earlier models, enabling continuous storytelling rather than fragmented assembly.

0:00
/0:10

Prompt

Master Prompt: Joker begins his iconic dance descent down the stairs, arms outstretched, pure chaotic joy.

Multi shot Prompt 1: Man in red suit starts dancing at top of stairs, taking first exaggerated steps down, arms spreading wide, head tilting back in ecstasy, cigarette smoke trailing (Duration: 5 seconds)

Multi shot Prompt 2: Continuing wild dance down concrete steps, spinning and kicking, coat flapping dramatically, pure liberation and madness, reaching the bottom with triumphant pose (Duration: 5 seconds)

Image-to-Video: Lock First, Then Move

When using image-to-video, treat the input image as an anchor. Kling 3.0 excels at preserving the identity, layout, and text details of the source image while introducing motion and depth.

Prompts should focus on how the scene evolves from the image: subtle movements, camera motion, or environmental changes. The model can maintain text, signage, and visual details from the original image, making it particularly effective for advertising, branded content, and realistic scene extensions.

Audio Prompt Guide

Multi-character Dialogue Prompt Examples and Guidelines

Principle Guideline Correct Example Incorrect Example
P1. Structured Naming Character labels must be unique and consistent. Avoid pronouns or synonyms. [Character A: Black-suited Agent] and [Character B: Female Assistant] [Agent] says... Then, he says...
P2. Visual Anchoring Bind dialogue to a character’s unique actions. Describe the action first, then the dialogue. The black-suited agent slams his hand on the table.[Black-suited Agent, angrily shouting]: “Where is the truth?” [Black-suited Agent]: “Where is the truth?” (Model won’t know who slammed the table)
P3. Audio Details Assign unique tone and emotion labels to each character. [Black-suited Agent, raspy, deep voice]: “Don’t move.”[Female Assistant, clear, fearful voice]: “I’m scared.” “[Man] says…” “[Woman] says…” (Voice descriptions too vague)
P4. Temporal Control Use clear linking words to control sequence and rhythm. Optionally insert: “this is when the speaker switches.” [Black-suited Agent]: “Why?” Immediately, [Female Assistant]: “Because it’s time.” [Black-suited Agent]: “Why?” [Female Assistant]: “Because it’s time.” (Model may merge speech)

More Prompt Examples

0:00
/0:10

Prompt

A busy kitchen in the morning.
Cereal pouring. Coffee machine buzzing.
Kids running footsteps. Backpack zippers.

A mother flips toast quickly, stressed.
[Character A: Mom, fast urgent voice]: “Shoes on! We’re leaving in five minutes!”

Immediately, a little girl whines from the hallway.
[Character B: Little Daughter, crying voice]: “I can’t find my sweater!”

The older brother groans dramatically.
[Character C: Older Brother, annoyed sarcastic tone]: “Because you never put it away.”

Mom sighs heavily.
[Mom, shouting louder]: “Nobody is fighting before 8 AM!”

The dad walks in calmly sipping coffee.
[Character D: Dad, sleepy amused voice]: “Good morning, team.”

Mom turns sharply.
[Mom, exhausted voice]: “Help.”

0:00
/0:08

Prompt

Inside a parked car at night.
Rain tapping softly on the roof.
Low lo-fi music playing from the speakers.

A driver grips the steering wheel, nervous.
[Character A: Driver Friend, hesitant voice]: “So… are you mad at me?”

Immediately, the passenger stares out the window.
[Character B: Passenger Friend, quiet cold tone]: “I don’t know.”

The driver swallows.
[Driver Friend, softly speaking]: “That’s worse than yes.”

The passenger sighs deeply.
[Passenger Friend, tired voice]: “I just didn’t expect it from you.”

0:00
/0:08

Prompt

A quiet park bench in the late afternoon.
Birds chirping. Wind through trees.
Soft acoustic guitar music.

Two old friends sit side by side.

One smiles softly.
[Character A: Old Friend 1, warm nostalgic voice]: “It’s been… what, ten years?”

Immediately, the other laughs quietly.
[Character B: Old Friend 2, emotional voice]: “Too long.”

Pause.

[Old Friend 1, softly speaking]: “I missed you.”

The other nods slowly.
[Old Friend 2, whispering]: “Me too.”


API Documentation

Kling 3.0 and Kling o3 API's are exclusively available on fal. To generate using Kling 3.0 for free use the discount code "falkling3".