Skip to main content
Back to Blog
Veo 3Veo promptsAI video generationGoogle DeepMindvideo prompts with audio2026

50 Best Veo 3 Prompts in 2026: Copy-Paste Templates with Native Audio

50 copy-paste Veo 3 prompts that exploit native audio — clips with ambient sound, dialogue, foley, and music generated by the model. Organized by genre across 7 categories.

SurePrompts Team
May 6, 2026
49 min read

TL;DR

Fifty Veo 3 prompts engineered around the model's signature capability — native audio synthesis. Each prompt specifies what kind of ambient sound, foley, dialogue, or music should accompany the visual, so output is delivered as a complete clip not silent video. Organized by genre across cinematic, character action, lifestyle, product, narrative, sports, and abstract.

Veo 3's real differentiator isn't resolution or clip length — it's that audio comes out of the same model that generates the video, synthesized together rather than layered on afterward. Most prompts ignore this entirely, treating Veo 3 like a silent film camera with a score pasted on top. These 50 copy-paste Veo 3 prompts don't do that — every single one directs the audio explicitly, because leaving it unspecified wastes the most important thing the model can do.

What Native Audio Changes

Every other AI video model either skips audio or lets you bolt on a separate track. Veo 3 generates ambient sound, dialogue, foley, and music natively in the same pass as the visual. That changes how you write prompts.

Describe ambient sound explicitly. Don't write "a busy coffee shop" and assume the model will handle it. Tell it what the room sounds like: low murmur of a dozen conversations, an espresso machine cycling every thirty seconds, the scrape of a chair on tile, a door chime when someone enters. The model responds to specificity. "Urban ambient" is a label. "Wind channeling between glass-and-steel towers, distant cab horns, the hiss of a bus air brake" is a sound design note.

Foley is in-prompt. Any action that would make a sound in the real world — write it. Footstep on gravel, ice clinking in a glass, the snap of a product case, knuckles on a door, fabric rustling as someone pulls a jacket on — these are all foley moments, and the model generates them if you tell it to. If the subject picks up a ceramic mug, name the sound: the small clink of ceramic on a wooden surface. Name what the action sounds like, not just what it looks like.

Dialogue intent is a production note. Veo 3 can generate characters who speak. When your clip has dialogue, specify: who is speaking, what kind of exchange it is (casual conversation, a sales pitch, a whispered argument), whether it's mid-sentence or a complete thought, and the language. "English, casual mid-conversation tone, one character finishing a sentence while the other nods" gives the model far more to work with than "two people talking."

Music presence and absence both need direction. Silence can be a choice — an empty desert at noon with only wind deserves "no music." But when music belongs, name the instrumentation and mood: "sparse piano, minor key, building tension" or "lo-fi hip-hop beat, warm vinyl crackle, laid-back" or "cinematic strings swelling toward a release." "Epic music" means nothing. Sparse cinematic strings over a sub-bass drone means something.

Mix priority is post-production guidance in the prompt. When you have multiple audio layers — ambient, foley, dialogue, music — the model needs to know which sits forward. "Dialogue forward, ambient sound quiet underneath" tells the model to treat the conversation as the foreground element. "Foley prominent, no music, ambient natural" tells it this is a texture-and-sound clip. Mixing instructions belong in the prompt the same way camera movement instructions do. For a complete breakdown of structuring Veo 3 prompts from the ground up, read the Veo 3 Prompt Guide. For broader AI video prompting principles that apply across models, see the Complete AI Video Prompting Guide.

50
Copy-paste Veo 3 prompts with explicit audio direction across 7 genres

Cinematic Wide & Establishing Prompts (1–8)

1. Neon City Night

code
An extreme wide establishing shot of a rain-soaked urban canyon at 2am,
neon signs reflecting on black asphalt, steam rising from subway grates,
one figure with an umbrella crossing the frame left-to-right.
Camera: locked off, absolutely still. Lens: wide angle, slight barrel distortion.
Lighting: practical neon — magenta, cyan, amber — no fill light, deep shadows.
Audio:
  - Ambient: rain drumming on metal awnings, distant traffic on wet pavement,
    the hiss of a passing cab through a puddle, low rumble of city infrastructure
  - Foley: umbrella handle shifting, single set of footsteps on wet concrete
  - Dialogue: no dialogue
  - Music: no music — let the city breathe
  - Mix: ambient fully forward, foley sits naturally in space
Aspect: 2.39:1. Duration: 10 seconds.

2. Vast Desert Wind

code
An ultra-wide static shot of terracotta sand dunes at high noon,
heat shimmer distorting the horizon, no shadow because the sun is directly overhead,
a lone dead tree in the right third of frame.
Camera: static, tripod locked. Lens: ultra-wide, maximum depth of field.
Lighting: harsh overhead noon sun, bleached sky, no color correction.
Audio:
  - Ambient: dry wind moving across sand — a low sustained whistle with
    gritty texture, occasional burst of sand grain on the lens filter sound,
    profound silence underneath the wind
  - Foley: no foley — there is nothing to touch here
  - Dialogue: no dialogue
  - Music: sparse single sustained cello note, fades out after 3 seconds, then silence
  - Mix: ambient dominant, music briefly present then gone
Aspect: 2.39:1. Duration: 10 seconds.

3. Sci-Fi Establishing Shot

code
A high crane shot pulling back from a massive orbital station exterior,
panels of solar collectors and docking arms receding into frame,
a planet's curved limb visible below, stars fixed and cold.
Camera: slow crane pull-back, almost imperceptible drift. Lens: telephoto compression.
Lighting: hard single-source sun angle from camera right, deep shadow on left faces.
Audio:
  - Ambient: deep sub-bass structural hum of the station, barely below speech range,
    the subtle creak of thermal expansion as the station turns
  - Foley: low mechanical clank as a docking arm locks into position mid-shot
  - Dialogue: no dialogue
  - Music: minimal electronic drone, single held note, no rhythm, cold and vast
  - Mix: drone sits under structural hum; foley clank punctuates once, then fades
Aspect: 2.39:1. Duration: 12 seconds.

4. Period Drama Exterior

code
A wide master shot of a cobblestone market square in a northern European city,
circa 1890, overcast midday, merchants at canvas stalls, horse-drawn carts passing,
women in long coats, men in top hats, children near a fountain.
Camera: slow lateral dolly right-to-left on a low track. Lens: moderate wide.
Lighting: flat overcast diffusion, no hard shadows, muted earth tones.
Audio:
  - Ambient: iron-rimmed cart wheels on cobblestone, horse hooves steady clopping,
    crowd murmur in German, a church bell counting noon from off-screen left
  - Foley: splash of water as a child reaches into the fountain, canvas stall flap
    snapping in the wind
  - Dialogue: two merchants exchange a quick phrase in German — one laughs, brief
  - Music: no music — the period atmosphere carries it
  - Mix: ambient and foley balanced, dialogue audible but not featured
Aspect: 16:9. Duration: 12 seconds.

5. Snow Mountain Silence

code
A locked-off wide shot of a glacial valley in winter, mid-morning,
blue-white snow on a 4000-meter peak filling the upper two-thirds of frame,
a frozen river catching cold light in the foreground, not a single person.
Camera: completely static, no movement. Lens: telephoto, 135mm equivalent compression.
Lighting: low winter sun from hard left, long shadows on the snow, sky deep cobalt blue.
Audio:
  - Ambient: near-total silence — the high-pitched presence of elevation, occasional
    wind rush over the mic, a single distant avalanche rumble on the far range
  - Foley: no foley
  - Dialogue: no dialogue
  - Music: no music
  - Mix: pure silence with natural texture; the avalanche rumble is the only event
Aspect: 2.39:1. Duration: 10 seconds.

6. Crowded Market

code
A handheld wide shot moving through a covered spice market in Marrakech,
late afternoon, shafts of dusty light from roof gaps, color everywhere,
vendors calling, customers examining goods, motorbikes threading through.
Camera: handheld, shoulder-mounted energy, pushing through the crowd.
Lens: 24mm wide, slight lens breathing. Lighting: shafts of warm late-afternoon sun
cutting through dust haze, deep shadows in the stall interiors.
Audio:
  - Ambient: dense layered market audio — Arabic calls from vendors,
    motorbike two-stroke engines, metal pot on metal surface, cash changing hands,
    overlapping languages (Arabic, French, snippets of English)
  - Foley: camera operator brushing against a hanging fabric, footstep on
    packed-earth floor under the crowd noise
  - Dialogue: a vendor calls out in Moroccan Arabic — "come look, best price!"
    directed at camera
  - Music: no music — the market is the music
  - Mix: ambient fully saturated, dialogue punches through naturally
Aspect: 16:9. Duration: 10 seconds.

7. Ocean Storm

code
A wide low-angle shot from a sea cliff in a full North Atlantic storm,
waves crashing into the cliff base twenty meters below, spray reaching the lens,
the horizon tilted and dark, no sky, only cloud and chaos.
Camera: locked to cliff rock, occasional organic shake from wave impact below.
Lens: wide, water drops on lens surface visible. Lighting: storm diffuse, grey-green.
Audio:
  - Ambient: roaring sustained wave crash building and releasing in 6-second cycles,
    deep sub-bass impact when each wave hits cliff base, howling wind in the mic
  - Foley: spray hitting rock, pebble rattle in the wave wash
  - Dialogue: no dialogue — impossible in this wind
  - Music: no music — the storm is the score
  - Mix: full dynamic range, bass impacts uncompressed, wind howl authentic
Aspect: 16:9. Duration: 12 seconds.

8. Golden-Hour Countryside

code
A slow aerial drift over patchwork English countryside farmland at golden hour,
hedgerows dividing green and gold fields, a narrow lane with a single cyclist,
long shadows reaching east as the sun settles west.
Camera: slow drone drift forward and slightly descending. Lens: wide, high altitude.
Lighting: golden hour, 20 minutes before sunset, warm amber from hard west,
long shadows, sky gradient from orange to pale blue overhead.
Audio:
  - Ambient: wind over the mic at altitude, birdsong rising from the hedgerows below,
    very distant tractor engine fading left
  - Foley: no foley at this altitude
  - Dialogue: no dialogue
  - Music: solo acoustic guitar, fingerpicked, simple melody in D major, unhurried
  - Mix: music and ambient share the space equally, guitar sits on top of wind texture
Aspect: 16:9. Duration: 12 seconds.

Character Action with Dialogue Prompts (9–15)

9. Two-Person Dialogue Exchange

code
A medium two-shot of two women sitting across a small cafe table,
one leaning forward making a point, the other listening with a slight smile,
coffee cups between them, afternoon light from a window left of frame.
Camera: static medium two-shot, subtle rack focus between the two.
Lens: 50mm. Lighting: soft window light from left, warm color temperature, gentle fill.
Audio:
  - Ambient: low cafe background — espresso machine in the distance, muffled other
    conversations, gentle acoustic music barely audible from the speakers
  - Foley: coffee cup placed down on saucer, the rustle of a jacket sleeve on the table
  - Dialogue: English, casual conversation — one woman says "But that's exactly my point,
    you keep treating it like a risk when it's actually an opportunity" — the other
    responds "I know, I know, you're right" with a small laugh
  - Music: cafe background acoustic music, barely present
  - Mix: dialogue fully forward and intelligible, ambient and foley recede,
    cafe music almost subliminal
Aspect: 16:9. Duration: 10 seconds.

10. Person on a Phone Call

code
A medium close-up of a man in his 30s walking fast through a glass-walled
office corridor, phone pressed to his ear, jaw set, clearly in the middle
of a difficult call, city skyline visible through the glass behind him.
Camera: handheld tracking forward alongside him. Lens: 35mm slight handheld drift.
Lighting: cool corporate fluorescent overhead, warm city glow from exterior glass.
Audio:
  - Ambient: corporate office ambient — ventilation hum, muffled keyboard sound
    through walls, elevator chime in the distance
  - Foley: his leather-soled shoes on polished concrete at a brisk pace,
    his hand adjusting the phone against his ear
  - Dialogue: English — he says "No, listen to me, that number needs to be confirmed
    before end of day, not tomorrow — today" — pause — "I understand that, but it
    doesn't change what I'm asking"
  - Music: no music
  - Mix: dialogue forward, footsteps audible, ambient low underneath
Aspect: 9:16. Duration: 10 seconds.

11. Narration Over Walking

code
A wide-to-medium shot tracking behind a young woman walking through
an autumn park, hands in jacket pockets, fallen leaves on the path,
bare trees lining the way, nobody else in sight.
Camera: slow tracking shot from behind, gradually closing distance.
Lens: 85mm. Lighting: diffuse overcast, warm amber from fallen leaves.
Audio:
  - Ambient: wind through bare branches, occasional leaf skittering across
    the path on the ground, very distant park sounds
  - Foley: her boot steps on the leaf-covered gravel path — a soft crunch
    with each step, consistent rhythm
  - Dialogue: voiceover, her own voice, introspective — "I keep thinking about
    what it means to actually start over. Not restart — start over. From nothing.
    And honestly? That part doesn't scare me anymore."
  - Music: sparse piano, single notes, no chord progression — just space
  - Mix: voiceover dominant, footstep foley present, piano barely there,
    ambient wind underneath everything
Aspect: 16:9. Duration: 12 seconds.

12. Child-and-Parent Moment

code
A medium close-up of a mother and her daughter (around 7 years old)
sitting on a wooden dock over a lake, feet hanging over the edge,
watching the water, late afternoon sun behind them creating warm backlight.
Camera: static medium, slightly low angle looking up at both of them.
Lens: 50mm. Lighting: golden backlight, warm lens flare from upper right.
Audio:
  - Ambient: lake lapping against dock pilings in a slow irregular rhythm,
    birdsong from the far shore, a distant motorboat fading away
  - Foley: the daughter's feet swinging and tapping against the dock side,
    the wood of the dock creaking slightly underweight
  - Dialogue: English — daughter asks "Mom, do you think fish dream?" — mother laughs
    softly, thinks for a second, says "I don't see why not"
  - Music: no music — the lake and the conversation hold it
  - Mix: dialogue intelligible, lake ambient present, foley natural
Aspect: 16:9. Duration: 10 seconds.

13. Barista Taking an Order

code
An over-the-counter POV shot of a barista at a specialty coffee counter,
looking up from the espresso machine, engaged and attentive,
industrial wood-and-steel cafe behind, morning rush energy.
Camera: slight low angle from customer POV, static. Lens: 35mm.
Lighting: warm tungsten pendant lamps above the bar, cool daylight through front glass.
Audio:
  - Ambient: espresso machine cycling nearby — pump building pressure, steam wand
    hissing in milk, low conversation murmur of the cafe in full morning mode
  - Foley: the barista sets a ceramic cup down on the counter as she looks up,
    the espresso machine clicks off behind her
  - Dialogue: English — barista says with a warm smile "Good morning — what are we
    doing today?" — a brief pause — "The Ethiopia's really good this week
    if you want something bright"
  - Music: lo-fi background through the cafe speakers — vinyl warmth, barely audible
  - Mix: dialogue forward and clear, espresso machine foley prominent,
    cafe ambient full but not overwhelming, music subliminal
Aspect: 9:16. Duration: 10 seconds.

14. Classroom Moment

code
A wide shot of a university seminar room, about 20 students,
late afternoon, a professor mid-lecture at the front, one student's hand raised,
everyone else watching the exchange, windows showing a campus courtyard.
Camera: static wide from the back of the room, stable. Lens: wide 28mm.
Lighting: warm afternoon sun from the right-side windows, warm classroom overhead.
Audio:
  - Ambient: the subtle acoustic of a room full of quiet people — ambient breath
    and presence, pen on paper somewhere in frame, ventilation overhead
  - Foley: the student's notebook closing as they raise their hand
  - Dialogue: English — professor mid-sentence: "—which is why the data doesn't
    actually support that conclusion." The student responds: "But what if the
    sample size is the problem, not the interpretation?" A beat. The professor:
    "Now that is exactly the right question."
  - Music: no music
  - Mix: dialogue carried naturally in the room acoustic, ambient present,
    no artificial clarity — it should sound like a real room
Aspect: 16:9. Duration: 12 seconds.

15. Interview-Style Monologue

code
A medium close-up of a man in his 50s sitting in a warm living room,
documentary-style, looking just off-camera to the left (interviewer position),
considered and measured, pausing to think before speaking.
Camera: static medium close-up, shallow depth of field. Lens: 85mm.
Lighting: soft practical lamp from camera left, warm 3200K, gentle fill from right.
Audio:
  - Ambient: living room quiet — the tick of a clock off-screen,
    the distant sound of a street through a closed window
  - Foley: he shifts in his chair, the leather barely audible
  - Dialogue: English — deliberate, personal — "The hardest part wasn't the failure.
    The hardest part was the six months after, when I still thought I could fix it.
    When I hadn't accepted yet that it was actually over."
  - Music: no music — the silence earns its place
  - Mix: dialogue completely forward, ambient barely present, silence between
    sentences fully maintained
Aspect: 16:9. Duration: 12 seconds.

Product & Commercial Prompts (16–22)

code
An extreme close-up of a cold glass of sparkling water being filled,
ice already in the glass, bubbles rising, condensation on the exterior,
clean white marble surface, black background, studio commercial aesthetic.
Camera: static macro close-up, locked off. Lens: macro, very shallow DOF.
Lighting: studio three-point — hard key from right, soft fill from left,
rim light catching the condensation on the glass.
Audio:
  - Ambient: absolute studio silence before the pour begins
  - Foley: the clink of ice against glass as the pour starts, the rushing sound
    of sparkling water hitting ice — that carbonated hiss distinct from still water,
    the fizzing and crackling of bubbles finding the ice surface,
    the liquid settling to fill level
  - Dialogue: no dialogue
  - Music: no music during pour; a single low piano note at the end as the glass settles
  - Mix: foley completely dominant — every sound of this pour is the product story
Aspect: 1:1. Duration: 8 seconds.

17. Tech Device Click and Beep

code
A medium close-up product shot of a premium wireless earbud being
removed from its charging case, the case held in one hand,
one earbud lifted out with two fingers, a subtle LED pulse.
Camera: static macro-ish medium, slight angle from above-right. Lens: 100mm macro.
Lighting: studio — cold key light from above, white reflector fill, black background.
Audio:
  - Ambient: pure studio silence
  - Foley: the soft magnetic click as the lid opens, the barely-there friction sound
    of the earbud being lifted from its magnetic seat, a delicate triple-beep
    connection tone from the earbud itself — clean, designed, modern
  - Dialogue: no dialogue
  - Music: no music
  - Mix: foley only — the click and the connection beep are the entire audio story,
    silence before and after them
Aspect: 1:1. Duration: 6 seconds.

18. Fashion Fabric Rustle

code
A medium shot of a model putting on a structured silk blazer,
shot from behind as she pulls it over her shoulders and it settles,
minimal studio — cream background, simple lighting, the garment is everything.
Camera: static medium from behind, slow motion 50% speed. Lens: 50mm.
Lighting: soft studio box light from above and left, subtle shadows defining the drape.
Audio:
  - Ambient: studio silence
  - Foley: the specific sound of heavy silk moving over a cotton shirt —
    a high-frequency whisper-rustle as the sleeves slide up the arms,
    the final settle and adjustment as the collar lands, the soft sound
    of her hand smoothing the back hem
  - Dialogue: no dialogue
  - Music: a single held note from a bowed cello, fades as the blazer settles
  - Mix: foley featured — the fabric sound is the point; music accent is minimal
Aspect: 9:16. Duration: 8 seconds.

19. Automotive Engine and Interior Cabin

code
A sequence starting exterior on a sports sedan at idle in an underground
garage, then cutting to an interior dashboard view as the engine revs.
Camera: starts exterior locked at grill height, cuts to interior dashboard close-up.
Lens: exterior 35mm; interior wide 24mm. Lighting: garage fluorescent overhead,
dashboard practical backlighting.
Audio:
  - Ambient: underground garage acoustics — concrete reverb, HVAC from the ceiling,
    very distant traffic ramp above
  - Foley: the resonant idle of the engine at start — a controlled, refined V6 burble
    through the exhaust; then the driver's hand taps the center console,
    leather creak as weight shifts in the seat; then a deliberate rev to 3500 RPM —
    the engine's voice filling the garage and bouncing off the concrete walls
  - Dialogue: no dialogue
  - Music: no music — the engine is the music
  - Mix: engine sound fully dynamic — don't compress the rev; let the room response sit
Aspect: 16:9. Duration: 10 seconds.

20. Food Being Prepared — Sizzle and Pour

code
A tight overhead shot of a cast-iron skillet on high heat,
a thick-cut salmon fillet being placed skin-side down,
oil already smoking, immediate violent sizzle on contact.
Camera: directly overhead, static macro. Lens: 50mm.
Lighting: strong overhead practical kitchen light, warm tungsten, steam-lit.
Audio:
  - Ambient: kitchen background — a range hood at low speed, background kitchen
    presence, the tick of the cast iron before the fish goes in
  - Foley: the wet hiss and crackle of the salmon hitting the pan —
    a sustained loud sizzle that peaks immediately and slowly resolves,
    oil popping and spattering, steam rising audibly
  - Dialogue: no dialogue
  - Music: no music
  - Mix: foley dominant — the sizzle should feel physical; ambient range hood
    sits underneath it; nothing competes with that contact sound
Aspect: 1:1. Duration: 8 seconds.

21. Beauty Applicator — Delicate Sounds

code
An extreme close-up of a glass serum dropper being pressed once,
releasing a single amber drop onto fingertips, then gently pressed
onto the side of the neck, the skin surface catching soft beauty lighting.
Camera: locked macro close-up, two positions: dropper, then skin.
Lens: macro. Lighting: beauty dish from above, soft reflector fill, warm skin tones.
Audio:
  - Ambient: absolute silence — aspirational bathroom quiet
  - Foley: the soft squeeze of the rubber dropper bulb, the barely-audible drop
    hitting the fingertip skin, then the gentle pressing sound of fingertips
    on neck — delicate, deliberate, skin-on-skin softness
  - Dialogue: no dialogue
  - Music: a single piano note, very soft, fades immediately
  - Mix: foley is the complete audio — every sound is intentionally quiet,
    intimate, precise; the piano note is an accent not a statement
Aspect: 9:16. Duration: 8 seconds.

22. Packaging Unbox — Rip and Click

code
A medium close-up of hands opening a premium matte-black product box,
clean white table, the lid being lifted away from a magnetic base,
tissue paper inside, a product nestled in a foam insert.
Camera: slightly above, angled — not quite overhead but close. Lens: 50mm.
Lighting: soft diffused studio, clean key from above, minimal shadow.
Audio:
  - Ambient: studio quiet
  - Foley: the firm click-pull of the magnetic lid releasing — that resistance
    then pop that good packaging makes; the soft crinkle of the tissue paper
    as it's gently pushed aside; the product lifting from its foam
    with a quiet friction sound
  - Dialogue: no dialogue
  - Music: low pad, single sustained chord, builds softly as the product is revealed
  - Mix: foley featured — every tactile sound deliberate; music swells under the
    reveal moment only, then releases
Aspect: 1:1. Duration: 10 seconds.

Lifestyle & Documentary Prompts (23–29)

23. Coffee Shop Morning Ambient

code
A wide slow push-in of a specialty coffee shop at 8am opening rush,
baristas in motion behind the counter, a line of five people waiting,
morning light pouring through east-facing windows, dust motes in the beams.
Camera: slow dolly push-in from front door toward the counter. Lens: 28mm.
Lighting: strong natural morning light from the right, warm tungsten from pendants.
Audio:
  - Ambient: the full coffee shop morning sound — two espresso machines in overlapping
    cycles, steam wand bursts, the click of portafilter locking in, low conversation
    from the line, a jazz standard at low volume from the speaker system
  - Foley: a cup placed on the counter, the squeak of a sneaker on tile,
    a paper bag being shaken open
  - Dialogue: the barista calls out a name — "Oat latte for Marcus" — someone
    responds "that's me, thanks" — brief, natural, not featured
  - Music: light jazz from the cafe speakers, present but not prominent
  - Mix: all four layers simultaneously — this is an ambient scene, nothing is featured;
    the mix should feel like walking in, not watching a performance
Aspect: 16:9. Duration: 12 seconds.

24. Family Dinner Conversation

code
A medium wide shot of a family dinner table — four people, two parents,
two teenage kids — mid-meal, passing dishes, cross-talk, laughter,
warm dining room, evening light from a low hanging pendant over the table.
Camera: static medium wide from the end of the table. Lens: 35mm.
Lighting: warm tungsten pendant at 2800K, soft fill from room practical lights.
Audio:
  - Ambient: the room tone of a dining room — the low hum of a refrigerator
    from the adjacent kitchen, background heat from the oven still warm
  - Foley: cutlery on ceramic plates, a serving spoon in a bowl, a glass being set
    down, the sound of food being passed
  - Dialogue: English — overlapping and natural — one teenager says "wait, no, that's
    not how it happened at all" and starts laughing; the other protests "that is exactly
    how it happened!"; a parent says "can someone please just pass the bread"
  - Music: no music — the family is filling the space
  - Mix: dialogue present but overlapping and naturalistic — not a scripted exchange;
    foley grounded and prominent; room tone present
Aspect: 16:9. Duration: 10 seconds.

25. Working from Home — Keyboard and Focus

code
A medium close-up of a person typing at a standing desk in a home office,
plants in the background, afternoon window light, a focused expression,
a mug of tea steaming beside the keyboard.
Camera: static medium, slightly lateral. Lens: 50mm.
Lighting: soft window light from left, diffused afternoon sun, warm.
Audio:
  - Ambient: home office quiet — bird outside the window, a distant lawnmower
    two houses away, the baseline silence of a residential interior
  - Foley: mechanical keyboard typing — a mid-key clicky switch, the specific
    rhythm of someone composing (bursts of words, pauses, backspace, more typing),
    the ceramic mug being lifted and set back on the wooden desk
  - Dialogue: no dialogue
  - Music: lo-fi beat, very quiet — soft drums and warm bass, vinyl texture
  - Mix: keyboard foley prominent, music present as texture not feature,
    outdoor ambient barely there under the window
Aspect: 16:9. Duration: 10 seconds.

26. Gym Workout — Breathing and Weights

code
A medium shot of a woman in the final reps of a barbell squat set,
a real gym — chalk dust, iron smell if that were possible, mirrors,
other athletes present in soft focus behind.
Camera: static medium from the front, slightly low angle. Lens: 35mm.
Lighting: gym fluorescent mixed with some natural light from high windows.
Audio:
  - Ambient: gym ambient — ventilation, the distant clank of weights from
    another rack, low bass from the speakers playing something with rhythm
  - Foley: the breath pattern — controlled descent exhale, sharp exhale on
    the drive up, that specific sound of effort; the weight plates gently
    tapping as she reaches the top; chalk dust settling
  - Dialogue: no dialogue
  - Music: heavy hip-hop or electronic track through the gym speakers — present
    but distant, the kind of music that belongs in this space
  - Mix: breath foley forward and physical, weight sounds natural,
    gym music present as context not as score
Aspect: 9:16. Duration: 10 seconds.

27. Weekend Hike — Wind and Birds

code
A wide tracking shot following two hikers on a ridge trail, backs to camera,
a valley spreading out below them, cumulus clouds casting moving shadows
on the hillside, the trail barely visible in the long grass.
Camera: tracking behind them at 15 meters, handheld documentary feeling.
Lens: 35mm. Lighting: bright midday cloud diffusion, occasional direct sun through gaps.
Audio:
  - Ambient: the ridge soundscape — wind moving through long grass (a sustained,
    shifting hiss), a kestrel calling overhead, sheep bells from the valley below,
    the distant sound of a stream not visible in frame
  - Foley: their footsteps in dry grass and loose stone — the specific crunch of
    a boot finding solid rock then long grass again
  - Dialogue: English — one hiker says to the other "how far to the top from here?"
    the other replies "maybe another forty minutes, but the view's worth it"
  - Music: acoustic guitar, fingerpicked, open tuning, unhurried — enters quietly
    under the ambient
  - Mix: ambient and foley share the foreground, dialogue natural and unforced,
    guitar present but underneath everything
Aspect: 16:9. Duration: 12 seconds.

28. Urban Commute — Traffic

code
A POV handheld shot of someone walking through a busy city intersection
during morning rush hour — pedestrians all moving with purpose,
buses pulling in and out, a jackhammer somewhere to the right.
Camera: first-person POV, moderate handheld movement in step with walking pace.
Lens: 24mm wide. Lighting: flat overcast urban morning, building shadows, grey sky.
Audio:
  - Ambient: dense layered urban morning — the bass of a bus engine accelerating
    from a stop, taxi horns, the sustained roar of traffic in all four directions,
    a jackhammer in the mid-distance (right channel, slightly off center),
    the underground ventilation grates exhaling warm air
  - Foley: the walker's footsteps on concrete crossing — confident, quick pace
  - Dialogue: no dialogue from our POV character; snatches of phone conversation
    from a passer-by — one word, then gone
  - Music: no music
  - Mix: the full urban layer is the complete audio story — rich, layered,
    unclean in a realistic way; foley footsteps audible within it
Aspect: 9:16. Duration: 10 seconds.

29. Restaurant Kitchen — Calls and Clatter

code
A wide shot of a professional restaurant kitchen mid-service,
multiple stations working simultaneously, fire and steam, plating happening
at the pass, expediter calling orders, controlled controlled chaos.
Camera: locked off wide from the expediter position. Lens: 28mm wide.
Lighting: fluorescent overhead mixed with intense blue gas flame light.
Audio:
  - Ambient: the sustained din of full service — multiple burners at high,
    the exhaust hood roar, rapid-fire clatter of pans
  - Foley: a pan being shaken over high heat, a lid dropped briefly (clatter on steel),
    a squeeze bottle on a plate
  - Dialogue: the expediter in French and English calling: "Two salmon, one lamb, working
    on four — oui?" — voices from stations respond "oui, chef" — sharp, fast, no wasted
    syllable
  - Music: no music — this kitchen is its own rhythm section
  - Mix: ambient roar surrounds everything; dialogue cuts through it cleanly as
    it would in a real kitchen — loud, direct, competing with the noise
Aspect: 16:9. Duration: 10 seconds.

Narrative Short / Multi-Shot Prompts (30–36)

30. Two-Shot Dialogue with Cut and Audio Continuity

code
A cross-cut dialogue scene: medium close-up on a man speaking, cut to
medium close-up on a woman listening then responding, the conversation
continuous across the cut, a dim bar setting, late night.
Camera: alternating MCU, each static. Lens: 85mm on both. Lighting: practical bar
light, low-key, a candle on the table between them casting upward shadows.
Audio:
  - Ambient: bar ambient continuous through both cuts — the same low background
    of a mostly empty bar, ice in a glass somewhere, a door open briefly
  - Foley: the man sets his glass down before speaking, her fingers tap the table
    once before she answers
  - Dialogue: English — he says quietly, "I just need you to tell me the truth.
    Not what you think I want to hear." She holds the pause, then: "You already
    know the truth. That's why you're asking."
  - Music: slow piano jazz from the bar, continuous through the cut — same track,
    same progression — audio continuity proves the scene is one moment
  - Mix: dialogue forward on both sides of cut; ambient and music consistent
    across the cut as a single unbroken soundscape
Aspect: 16:9. Duration: 12 seconds.

31. Action-Reaction with Sound Bridge

code
A two-shot sequence: first shot is a close-up of a hand pushing open
a heavy oak door — it begins to creak. Second shot cuts to a woman at a desk
looking up from her work, reacting to the sound.
Camera: shot one — static close on the door handle and door edge.
Shot two — medium wide of the desk, static. Lens: both 50mm.
Lighting: dark wood-paneled interior, one desk lamp as key, shadows everywhere.
Audio:
  - Ambient: a hushed office — the kind of quiet that means everyone is listening
  - Foley: THE sound bridge — the door creak begins in shot one and carries
    perfectly through the cut into shot two; the creak is the edit;
    the sound of the door fully opening as her head comes up
  - Dialogue: she looks up and says softly, "I was wondering when you'd come back."
  - Music: no music until after she speaks — then a single low string note begins
  - Mix: foley is the structural element — the creak bridges the cut;
    dialogue is intimate and quiet; string note is an afterthought
Aspect: 16:9. Duration: 10 seconds.

32. Character Entering with Door Creak

code
A wide shot of a long empty hallway in an old Victorian house,
at the far end a door — the only light source behind it.
The door slowly opens, a silhouette fills the frame, pauses, enters.
Camera: locked off wide, completely static, flat perspective on the hallway.
Lens: wide 28mm. Lighting: practical light from behind the door only,
everything else dark, floorboards catching a strip of light.
Audio:
  - Ambient: the specific quiet of an old house — the structure settling,
    a clock ticking in a nearby room, wind outside pressing on a window
  - Foley: the exact sequence of sounds — the knob turning, the resistance of
    old wood before it moves, the creak of the hinge (long, multi-tone, authentic),
    the floorboard underfoot as the figure steps in, the door pushed to behind them
  - Dialogue: no dialogue — the silence after the door closes is the punctuation
  - Music: no music until the door closes — then a single sustained low tone, held
  - Mix: foley precisely timed — the creak is the scene; every floorboard moment
    counts; the sustained tone after is quiet and unresolved
Aspect: 2.39:1. Duration: 12 seconds.

33. Found-Footage Style with Handheld Audio

code
A found-footage-aesthetic shot of two people running through a darkened
parking structure, one holding a camera or phone, their own flashlight
the only illumination, breathing hard, something off-screen behind them.
Camera: first-person handheld, jostling with the run, the camera holder's
breath affecting the shot. Lens: wide phone-lens equivalent.
Lighting: only the flashlight — sharp foreground, darkness behind.
Audio:
  - Ambient: parking structure echo — their footsteps doubling in the concrete reverb,
    the distant sound of whatever they're running from (indistinct, low, getting louder)
  - Foley: their footsteps on concrete at a sprint — heavy and desperate,
    one of them catches their breath, a metal railing struck accidentally in the dark
  - Dialogue: breathless English — one says "go go go, don't stop" — the other
    is too winded to answer, just a sharp exhale
  - Music: no music — found footage logic means no score
  - Mix: raw and dynamic — the reverb is a character; breathing and footsteps
    at full level; the thing behind them audible but not defined
Aspect: 9:16. Duration: 10 seconds.

34. Single-Take Walk with Ambient Progression

code
A continuous tracking shot following a woman as she walks from a busy
city sidewalk through a glass lobby entrance, across the lobby, and into
an elevator — three distinct acoustic environments in one take.
Camera: steadicam tracking from behind and slightly to the right,
through all three spaces. Lens: 35mm. Lighting: exterior daylight,
then lobby artificial warm, then elevator cool fluorescent.
Audio:
  - Ambient: three acoustic environments in sequence — outside: traffic, wind, city;
    lobby entrance: the door seal breaking open, the sudden indoor reverb, lobby music
    beginning; elevator: tight, close, the whir of the motor as doors close
  - Foley: her heels on three surfaces — city concrete, marble lobby floor
    (different resonance), elevator metal floor (tight, hard sound)
  - Dialogue: lobby security guard says "morning" as she passes, she replies
    "morning" without slowing — two words, natural
  - Music: a light lobby instrumental begins as the door opens, then gets cut
    abruptly as the elevator doors close
  - Mix: the three-environment acoustic shift IS the audio story; each transition
    should be clean and deliberate; foley changes character with each surface
Aspect: 9:16. Duration: 12 seconds.

35. Before-and-After with Audio Shift

code
A static shot of a living room — before: cluttered, gloomy, winter light,
a person slumped on the couch. Then a visual morph / dissolve transition
to the same room after: the same angle, same framing, but transformed —
light, ordered, the person upright and reading.
Camera: completely locked off, identical framing in both states. Lens: 35mm.
Lighting: before — blue-grey overcast through dirty windows; after — warm
afternoon sun, clean windows, plants visible.
Audio:
  - Ambient: before — the heavy silence of a depressed room: muffled outside noise,
    no warmth, the person's slow breathing; after — birds through open windows,
    the tick of a clock that was silent before, movement and life
  - Foley: before — nothing; after — the sound of a page turning
  - Dialogue: no dialogue
  - Music: before — no music; the audio shift during the transition is a rising
    tone that resolves into: after — a single warm guitar chord, sustained, then fading
  - Mix: the audio transition mirrors the visual one — before is audio-grey;
    after has warmth and presence; the transition chord is the emotional pivot
Aspect: 16:9. Duration: 12 seconds.

36. Single-Room Conversation with Reverb

code
A medium two-shot in a large, sparse concrete-floored apartment —
an industrial loft with high ceilings and hard surfaces everywhere,
two men standing rather than sitting, a tense conversation.
Camera: static medium wide, neither subject fully comfortable in frame.
Lens: 50mm. Lighting: a single work lamp, harsh shadows on the concrete.
Audio:
  - Ambient: the acoustic of a large hard-surface room — every sound reflects;
    their presence in the room is audible even before they speak; distant street below
  - Foley: one of them steps backward, the sole of his shoe on concrete in this room
    is sharp and loud — the reverb tail is long
  - Dialogue: English — one says in a flat voice "So what happens now?" — the other
    waits, then: "I don't know. But we can't stay here." Their voices ring in the room.
  - Music: no music — the room's reverb is doing the emotional work
  - Mix: the room acoustic is the production design; dialogue slightly reverbed
    as it would genuinely be; footstep reverb tail fully present; silence between
    lines carries its own weight
Aspect: 16:9. Duration: 12 seconds.

Sports & Motion with Foley Prompts (37–43)

37. Basketball — Squeaks and Crowd

code
A low-angle tracking shot following a point guard penetrating to the rim
in a college gym, the defense scrambling, the shot going up, the
moment of hang time before contact.
Camera: low steadicam tracking at hip height alongside the drive.
Lens: 24mm, slight distortion from speed. Lighting: gym overhead fluorescent,
occasional lens flare from the court lights.
Audio:
  - Ambient: the live crowd — 3000 people in a tight gym, roar building as the
    drive develops, one sustained crescendo
  - Foley: shoes on hardwood — the high-frequency squeak of a hard cut, the
    specific multiple-squeak of a crossover dribble in traffic, the deep thud
    of the ball on the hardwood, the smack of a shot going up
  - Dialogue: no dialogue
  - Music: no music — crowd energy carries everything
  - Mix: foley (ball and shoes) prominent over crowd; the squeak is physical
    and sharp; crowd roar fills the rest; no score needed here
Aspect: 9:16. Duration: 8 seconds.

38. Skateboard Line — Grinds

code
A wide tracking shot following a skateboarder hitting a ledge at a
plaza — approach, ollie, grind, pop out, roll away — all in one line,
golden-hour concrete, the city visible behind the plaza.
Camera: tracking wide alongside, capturing the whole line. Lens: 35mm.
Lighting: golden hour from the right, long shadows on the concrete plaza.
Audio:
  - Ambient: urban plaza ambient — wind, distant traffic, a few spectators
    present but quiet, city baseline
  - Foley: each distinct sound of the line — wheels rolling on concrete (fast, smooth),
    the wood thwack of the ollie pop, the metallic grind of trucks on concrete ledge
    (that specific drawn-out grrrind), the pop off the end of the ledge, wheels
    landing and rolling away
  - Dialogue: no dialogue
  - Music: no music — let the sounds be the soundtrack
  - Mix: foley is the primary audio; each sound event timed to the action;
    ambient city fills the spaces between moments
Aspect: 16:9. Duration: 10 seconds.

39. Soccer — Kicks and Cheers

code
A medium telephoto shot from behind the goal: a striker receiving the ball
on the turn, 18 yards out, one touch and a struck shot — top corner.
Camera: static telephoto from behind and above the goal. Lens: 200mm equivalent.
Lighting: late afternoon stadium floodlights, warm golden cast on green grass.
Audio:
  - Ambient: stadium ambient — 40,000 people in baseline hum before the shot,
    then eruption of sound after it goes in; stadium echo and reverb
  - Foley: the specific sound of a clean struck ball — the firm leather thud of
    boot-on-ball, the whoosh of the ball through air, the snap of the net catching it
  - Dialogue: no dialogue
  - Music: no music — the stadium provides it
  - Mix: foley precisely placed — the boot contact sound first, brief silence of
    flight, net snap, then the crowd eruption floods in; the eruption is everything
Aspect: 16:9. Duration: 8 seconds.

40. Surfing — Wave Crash and Wind

code
A wide-to-medium shot of a longboarder dropping into a head-high
point break wave at first light, the wave a clean turquoise wall,
walking to the nose, the horizon line behind perfectly clean.
Camera: tracking from a ski boat alongside, low angle. Lens: 35mm.
Lighting: first light, sky pink and grey behind, the wave catching the
early sun and lighting from within.
Audio:
  - Ambient: open ocean morning — the deep-chest boom of a set breaking on the
    reef 200 meters to the right, constant wind at 15 knots, seabirds
  - Foley: the leading edge of the wave breaking and peeling — that sustained
    white-water rush and roar; the longboard finding trim; wind over the mic
    as the boat tracks alongside
  - Dialogue: no dialogue
  - Music: no music during the ride; a single acoustic guitar chord at the cutoff
  - Mix: ocean sound dominant and physical; foley of the wave breaking present
    and spatially correct; guitar only at the very end as the wave closes out
Aspect: 16:9. Duration: 12 seconds.

41. Slow-Motion Water with Splash and Drone Music

code
An ultra-slow-motion (240fps equivalent) extreme close-up of a single
water droplet hitting a perfectly still water surface, the crown splash
forming in symmetrical detail, the ripples expanding outward.
Camera: macro static, high speed. Lens: macro.
Lighting: hard single key light from above-right, dark background, the water lit.
Audio:
  - Ambient: silence — at this speed, no environmental ambient
  - Foley: the sound of the impact time-stretched to match the slow motion —
    a deep, resonant plunge sound, stretched and pitched down, alien and beautiful,
    the crown splash's water movement as a slow rushing texture
  - Dialogue: no dialogue
  - Music: a sustained electronic drone, single note, sub-bass, rising slowly
    as the crown reaches maximum height, then fading as the ripples expand
  - Mix: time-stretched foley and drone together; neither dominates; the
    combination creates something that doesn't exist in real time
Aspect: 1:1. Duration: 8 seconds.

42. Particle Motion — Electronic Hum

code
A wide shot of a sprinter leaving the starting blocks in an outdoor
stadium, shot from a low track-level camera, the crowd blurred behind,
the particle systems visible as dust and track rubber in the
compression of the push-off.
Camera: static extreme low angle, track level. Lens: 24mm wide, low.
Lighting: overcast stadium diffusion, no hard shadows, clean light.
Audio:
  - Ambient: stadium outdoor ambient — wind, distant crowd, the acoustic openness
    of an outdoor track
  - Foley: the compression and creak of the starting block, the explosive push
    of the start — the exact crack of starting blocks releasing under full power,
    the breath on the drive, shoe spike on synthetic track surface
  - Dialogue: no dialogue
  - Music: a building electronic pulse — sparse, rhythmic, starting a single beat
    before the gun and building with the acceleration
  - Mix: foley (block release, breath, spikes) forward; electronic pulse sits under
    and builds; stadium ambient wide and present
Aspect: 16:9. Duration: 8 seconds.

43. Drone-Style Flight — Wind and Motors

code
An aerial FPV-style shot flying low and fast through a canyon system,
barely meters above the river below, the sandstone walls blurring past,
a sharp banking right turn around a cliff face, then pulling up.
Camera: FPV drone perspective, high-speed, tilt and bank with the flight path.
Lens: ultra-wide fisheye equivalent, full distortion. Lighting: canyon midday —
high walls in shadow, the river lit from above.
Audio:
  - Ambient: the rushing wind of speed — a sustained high-speed air roar
    that changes pitch and intensity with each bank and pull
  - Foley: the drone motor whine — present but under the wind, varies with
    throttle changes (higher pitch on the pull-up, lower on the straight);
    the echo of the motors bouncing off the canyon walls briefly on the straight
  - Dialogue: no dialogue
  - Music: no music — the wind and motor audio is the score
  - Mix: wind fully forward and physical; motor pitch is the detail layer;
    the echo bounce is a spatial audio moment — brief, architectural
Aspect: 16:9. Duration: 10 seconds.

Abstract & Music-Forward Prompts (44–50)

44. Abstract Liquid — Ambient Pad

code
An extreme close-up of ink diffusing in water — deep cobalt blue ink
entering from the top of frame, spreading through clear water in slow
rolling cloud formations, backlit from below through a glass tank.
Camera: static macro, directly frontal. Lens: macro.
Lighting: single LED panel beneath the glass tank — cool white, diffused.
Audio:
  - Ambient: no environmental ambient — pure studio silence before the sound begins
  - Foley: the sound of the ink entering the water — a delicate soft pour, then
    the visual silence of diffusion given a sonic equivalent: a slow textural
    swirling tone, barely there
  - Dialogue: no dialogue
  - Music: a long, slow ambient pad — two notes a fifth apart, rising slightly in
    volume as the ink cloud expands, sustaining through the full clip
  - Mix: ambient pad is the primary audio; foley entry sound is a brief accent
    at the start; then the pad alone holds the space
Aspect: 1:1. Duration: 10 seconds.

45. Geometric Kaleidoscope — Rhythmic Music

code
A kaleidoscopic abstract animation — mirrored geometric shapes in black,
white, and gold, rotating with mechanical precision, each rotation
revealing a new symmetrical pattern, hypnotic and exact.
Camera: locked off frontal. Lens: n/a — abstract visual.
Lighting: internal — shape-generated light.
Audio:
  - Ambient: no environmental ambient
  - Foley: each geometric shape snapping into its mirror position — a series of
    crisp, dry clicks, perfectly timed to the rotation points
  - Dialogue: no dialogue
  - Music: a minimalist rhythmic piece — tabla or frame drum at a steady 120bpm,
    a single melodic line on a plucked instrument (koto or sitar) that follows
    the rotation intervals; music and visual rotation in exact sync
  - Mix: click foley and drum are at the same level; the melodic line sits just
    above them; they are all part of one designed system
Aspect: 1:1. Duration: 10 seconds.

46. Particle Simulation — Hum and Clicks

code
An abstract particle system — thousands of white points on black, initially
scattered, then slowly drawn toward each other by invisible gravity until
they form a human silhouette, then scatter again.
Camera: static frontal wide. Lens: n/a — abstract.
Lighting: particles self-emit against black.
Audio:
  - Ambient: a deep electronic hum — the sound of the simulation itself, a low
    120Hz drone that represents the field attracting the particles
  - Foley: individual particle movement as occasional clicks and micro-taps —
    not every particle has a sound, but at key formation moments clusters of
    particles clicking into the silhouette edge create brief textural bursts
  - Dialogue: no dialogue
  - Music: the drone IS the music — it rises in pitch as particles converge,
    reaches a held note when the silhouette is complete, then descends as
    the scatter begins
  - Mix: drone dominant from start to finish; click texture is the detail layer;
    the pitch of the drone is the narrative arc
Aspect: 16:9. Duration: 12 seconds.

47. Dreamlike Morph — Reverb Tones

code
A visual morph sequence — a woman's face dissolves into a forest canopy,
then into a night sky, then back to her closed eyes, the transitions
liquid and impossible, each state perfect before it changes.
Camera: medium close-up throughout, frontal. Lens: 85mm.
Lighting: each state has its own light — face: soft warm practical;
forest: dappled green; night sky: dark blue with star light.
Audio:
  - Ambient: each state has its own ambient: face — room tone; forest — wind and
    leaves; night sky — true silence with high-frequency presence; back to face — room tone
  - Foley: no foley — the morphs are not physical
  - Dialogue: no dialogue
  - Music: a three-note melodic phrase on a treated piano, each note stretched
    with long reverb tails so they overlap — the reverb from each note carries
    into the next visual state; the music creates the transitions
  - Mix: music and ambient cross-fade with each visual morph; the reverb tails
    of each piano note are the transition texture; it should feel continuous
    but shifting
Aspect: 16:9. Duration: 12 seconds.

48. Color-Field with Drone Music

code
A slow abstract color-field — a single deep red field on screen that very
slowly bleeds into amber at the right edge, then gold, the gradient moving
across the frame at barely perceptible speed, nothing else.
Camera: static, frontal. Lens: n/a.
Lighting: self-generated color gradient.
Audio:
  - Ambient: no environmental ambient
  - Foley: no foley — there is no object, no action
  - Dialogue: no dialogue
  - Music: a sustained tonal drone — two notes: a root and a flat seventh, held
    continuously, processed with a long reverb tail; the drone should feel as large
    as the color field; as the gradient shifts from red to gold, the upper note
    of the drone rises by one semitone, imperceptibly
  - Mix: pure music — the drone is the only audio; it should feel as though the
    sound and the color are the same substance
Aspect: 16:9. Duration: 15 seconds.

49. Slow-Motion Pour — ASMR

code
An extreme slow-motion close-up of honey being poured from a spoon onto
a dark wooden surface — the strand of honey connecting spoon to puddle,
the surface contact spreading in slow detail, warm amber catching the light.
Camera: macro static, side angle slightly elevated. Lens: macro.
Lighting: single warm tungsten key from the right, surface light bounce, dark background.
Audio:
  - Ambient: absolute silence — ASMR logic means the room doesn't exist
  - Foley: the sound of the honey strand's surface tension breaking as it first
    contacts the wood — a very soft, viscous drip; then the continuous slow
    pour sound: thick, rich, almost musical in its sustained quality;
    the surface spreading is a barely-audible wet presence
  - Dialogue: no dialogue
  - Music: no music — ASMR principles mean nothing competes with the foley
  - Mix: foley is the entire audio design; it should be mixed at high detail —
    every texture in the pour audible; silence surrounds each sound event
Aspect: 1:1. Duration: 10 seconds.

50. Time-Lapse with Rising Score

code
A compressed time-lapse of a city square over a full day — dawn to night
in one continuous locked-off shot, people flowing like water, light
raking from east to west, shadows sweeping, lights coming on at dusk.
Camera: locked off wide on a tripod, identical framing dawn to night.
Lens: 28mm. Lighting: natural light progression — warm dawn, harsh noon,
golden afternoon, blue-hour dusk, artificial night.
Audio:
  - Ambient: time-lapse logic means the ambient is compressed too — a layered
    acoustic texture that represents the full day: dawn birds, midday noise,
    evening wind, night city, all present simultaneously as a rich composite
  - Foley: no specific foley — this is too compressed for individual sounds
  - Dialogue: no dialogue
  - Music: a rising orchestral score — beginning with sparse strings at dawn,
    building gradually through the day, reaching a full orchestral swell at
    golden hour, then resolving to a quieter piano theme as night falls;
    the score tracks the light, not the clock
  - Mix: music is the primary audio — it narrates the day; compressed ambient
    sits below it as texture; by night the music leads alone
Aspect: 16:9. Duration: 15 seconds.

Veo 3 Power Tips

1

Audio is the differentiator — direct it in every prompt. Veo 3 synthesizes audio natively alongside the video. If you don't specify what the audio should be, you're leaving the model's most distinctive capability to chance. Every prompt should have an Audio block.

2

Treat ambient, foley, dialogue, and music as four separate channels. They operate differently, sit differently in a mix, and need separate instructions. A single "add audio" note is useless — four labeled lines are specific enough for the model to act on.

3

Describe the sound, don't just label it. "Footsteps audio" is a label. "Crisp footsteps on dry gravel, the crunch resolving into a light echo off a stone wall" is a sound design note. "Coffee shop sounds" is a label. "Espresso machine cycling every 30 seconds, steam wand bursts, ceramic on wood, low conversation murmur" is direction.

4

State your mix priority. "Dialogue forward, ambient quiet under" gives the model post-production guidance. "Foley prominent, no music" sets the texture of the scene. "Music is the primary audio, ambient below it" tells the model what the clip is for. Without mix priority, the model makes a guess — often wrong.

5

Duration constrains dialogue. An 8-second clip cannot hold a 4-line conversation. Match dialogue length to clip length. For clips under 10 seconds, one exchange or a sentence of voiceover. For 12-15 second clips, a short two-person exchange is workable. Don't write dialogue that doesn't fit.

6

For music, name mood and instrumentation together. "Epic music" has been interpreted a thousand ways. "Sparse cinematic strings, mid-tempo, building from a single cello line to a full string section, no percussion, unresolved ending" leaves no room for a generic interpretation. The more specific the instrumentation and the progression, the more the output matches what you're hearing in your head.

Before

Make a video of two people talking in a cafe.

After

A medium two-shot of two women at a small cafe table, afternoon window light from the left. Camera: static, soft rack focus between them. Lens: 50mm. Lighting: warm natural fill. Audio: Ambient: low cafe background — espresso machine in the distance, muffled conversations, a chair scraping tile. Foley: coffee cup placed on saucer, jacket sleeve shifting on the table. Dialogue: English, casual — one says "But that's my point, you keep treating it like a risk when it's actually an opportunity" — the other responds with a small laugh "I know, I know, you're right." Music: barely audible cafe acoustic from the speakers. Mix: dialogue fully forward and clear, ambient and foley recede, music almost subliminal. Aspect: 16:9. Duration: 10 seconds.

Start Building Better Veo 3 Prompts

These 50 prompts work because they tell Veo 3 what to do with all four audio channels, not just the visual. The model's native audio synthesis is only useful when you give it something to synthesize.

Use the AI prompt generator to build structured Veo 3 prompts in seconds — describe your scene and it outputs a formatted prompt with camera, lighting, and audio direction included. For a full breakdown of the prompting structure behind these templates, the Veo 3 Prompt Guide covers every parameter in depth. If you're choosing between Veo 3, Sora 2, and Runway for a specific project, the Veo 3 vs Sora 2 vs Runway comparison will tell you which model to use for which type of clip. And for a complete framework that applies across all AI video tools, the Complete AI Video Prompting Guide is the place to start.

Try it yourself

Build expert-level prompts from plain English with SurePrompts — 350+ templates with real-time preview.

Open Prompt Builder

Get ready-made Veo 3 prompts

Browse our curated Veo 3 prompt library — tested templates you can use right away, no prompt engineering required.

Browse Veo 3 Prompts