Most Veo 3 prompts are written for cinema. Wide establishing shots, sweeping crane moves, cinematic color grades — beautiful, and completely wrong for a 9:16 vertical feed. Short-form has its own grammar: hook in the first 1.5 seconds, native audio baked in from the start, platform pacing that rewards watch-throughs and loops. These 30 prompts are written for that grammar, not for a film reel.
What Short-Form Vertical Video Needs
Aspect ratio is non-negotiable. Veo 3 can generate widescreen or vertical, but Shorts and TikTok display 9:16 as the native format. Anything else gets cropped, letterboxed, or just looks wrong in the feed. Every prompt here locks Aspect: 9:16 explicitly — don't leave this to default.
The first 1.5 seconds is the hook. This is not a creative preference — it's a platform mechanic. Both Shorts and TikTok measure early drop-off, and the algorithm rewards clips that hold viewers past the first two seconds. That means your prompt needs to describe a strong opening visual or opening audio beat before anything else. A sudden sound, an unusual composition, a face mid-expression, a movement already in progress — something that makes scrolling past feel like a mistake. See the Veo 3 prompt guide for more on structuring first-second hooks.
Native audio carries short-form. Veo 3's signature capability is generating ambient sound, foley, dialogue, and music alongside the video — not as a separate layer but synthesized as part of the clip. On TikTok and Shorts, audio is often the reason people stop scrolling. A crisp keyboard click, a coffee pour, a creator speaking directly to camera — these are native audio directions, not things you add in post. Prompt them explicitly. The complete AI video prompting guide covers audio direction in detail.
Duration: think 6–15 seconds for most Shorts, up to 21 for TikTok. Shorter clips with high watch-through rates outperform longer clips with drop-off. For pure hooks or aesthetic loops, 6–9 seconds is the sweet spot. For story arcs or demos, 12–15 seconds for Shorts and up to 21 for TikTok gives you enough room without losing completion rate.
Genre shapes structure. A POV clip has different pacing physics than a talking-head. A transformation reveal lives on the cut between before and after. A listicle needs fast visual punctuation between items. Prompting generically produces generic clips. These 30 prompts are sorted by genre because the structural logic of each genre is different, and Veo 3 responds to that specificity.
POV & First-Person Prompts (1–5)
1. POV Neon City Night Walk
First-person POV walking through a rain-slicked neon-lit city street at night,
puddles reflecting purple and cyan signage, crowd noise bleeding in from the sides.
Hook (first 1.5 seconds): POV lurches forward as feet step into a neon reflection —
the first sound is a sharp wet footstep and a distant synth note.
Camera: handheld first-person, slight shoulder-roll motion, subtle lens flare as
neon signs pass. Lighting: high-contrast neon mixed light, warm storefront glow against
cool street darkness.
Audio:
- Ambient: rain mist, distant traffic, muffled crowd chatter
- Foley: each wet footstep sharp and distinct, umbrella brushing past
- Dialogue: none
- Music: moody lo-fi synthwave, low in the mix, building gradually
Aspect: 9:16. Duration: 12 seconds. Pacing: continuous single take, slow push forward.
2. POV Cooking Close-Up
First-person POV looking down at hands pressing a golden-butter-soaked piece of
bread onto a hot cast iron pan, kitchen counter in soft focus behind.
Hook (first 1.5 seconds): the first sound is the immediate sizzle as bread hits pan —
visually the butter bubbles in real time at the bottom of frame.
Camera: locked-off top-down POV with very slight handheld drift, close enough that
the pan fills 70% of frame. Lighting: warm overhead kitchen light, soft steam rising.
Audio:
- Ambient: low kitchen hum, ventilation fan in background
- Foley: aggressive bread-sizzle, butter pop, spatula scrape
- Dialogue: none
- Music: no music — foley carries the audio
Aspect: 9:16. Duration: 9 seconds. Pacing: single take, no cuts.
3. POV Gaming Setup Desk
First-person POV settling into a gaming chair, hands reaching forward to grab a
mechanical keyboard on a glowing RGB desk setup, dual monitors in background.
Hook (first 1.5 seconds): first sound is a satisfying clack of a single key press —
RGB lighting pulses once as hands enter frame.
Camera: handheld first-person settling into seat, slight forward lean, then locked.
Lighting: cool RGB ambient from desk, room otherwise dark, monitor glow on hands.
Audio:
- Ambient: low PC fan hum, subtle room silence
- Foley: chair settle creak, keyboard clack, mouse click
- Dialogue: VO, casual: "This is the setup. Let's get into it."
- Music: punchy lo-fi beat, short 8-second loop
Aspect: 9:16. Duration: 10 seconds. Pacing: single take with one audio cut at dialogue.
4. POV Morning Routine
First-person POV of a morning routine: alarm dismissed, feet hitting hardwood, window
blinds drawn open to flood the room with early golden sunlight.
Hook (first 1.5 seconds): alarm sound cuts abruptly — first visual is a hand
swiping the phone screen as golden light begins creeping from the side.
Camera: handheld first-person moving through space, natural motion. Lighting:
transition from warm bedroom lamplight to bright natural window light.
Audio:
- Ambient: birds outside window, faint neighborhood sounds after blinds open
- Foley: phone swipe tap, bare feet on wood, blinds mechanical rattle, light fixture click
- Dialogue: VO, warm: "Every day starts here. Make it count."
- Music: gentle upbeat acoustic, fades in after dialogue
Aspect: 9:16. Duration: 12 seconds. Pacing: 3 natural movements in 12 seconds, no hard cuts.
5. POV Outdoor Adventure Trail Run
First-person POV running a dirt trail through pine forest, dappled light through
canopy, roots and rocks visible underfoot, breath audible.
Hook (first 1.5 seconds): starts mid-stride — the sound is already the crunch of
boots on gravel and heavy rhythmic breathing before any context is established.
Camera: GoPro-style handheld first-person with natural bounce, looking down at
trail then lifting to show the path ahead through trees. Lighting: natural forest
morning light, dappled, high contrast patches.
Audio:
- Ambient: birdsong, wind through pine, distant creek
- Foley: footfall crunch, breathing rhythm, branch brush on arm
- Dialogue: none
- Music: energetic indie-electronic, driving 4/4 tempo matching footfall
Aspect: 9:16. Duration: 11 seconds. Pacing: continuous motion, single take.
Talking-Head Prompts (6–10)
6. To-Camera Quick Tip
Medium close-up of a creator sitting at a clean desk, speaking directly to camera
with natural energy, confident and conversational, daylight from window left.
Hook (first 1.5 seconds): creator is already mid-sentence, leaning slightly forward —
audio starts with them saying "Most people get this wrong—" before any B-roll.
Camera: locked-off medium close-up, subject centered, slight negative space above
head for top-third captions. Lighting: natural window key light, soft reflector fill.
Audio:
- Ambient: quiet room tone, air system low hum
- Foley: none prominent
- Dialogue: "Most people get this wrong — here's the one thing that actually changes
the result. [pause] It's not what you think." Casual, direct, not scripted-sounding.
- Music: no music — voice is the focus
Aspect: 9:16. Duration: 10 seconds. Pacing: single take, no cuts.
7. To-Camera Storytelling Hook
Tight close-up of a creator's face, visible surprise or disbelief expression,
speaking directly to camera in a low-lit room with a single practical lamp behind them.
Hook (first 1.5 seconds): creator says "I cannot believe this actually happened—"
at full energy, expression already mid-reaction before the clip even starts.
Camera: tight close-up, slight handheld micro-movement for intimacy, creator fills
80% of vertical frame. Lighting: single warm practical lamp behind left shoulder,
slight underlit dramatic quality.
Audio:
- Ambient: night-quiet room, near silence
- Foley: none
- Dialogue: "I cannot believe this actually happened— [beat] — so I need to tell you
exactly what went down, because this changes everything." Urgent whisper escalating
to full voice.
- Music: tense underscore, very low, almost subliminal
Aspect: 9:16. Duration: 9 seconds. Pacing: single take.
8. To-Camera Reaction
Medium close-up of a creator reacting to something off-screen (phone or monitor),
genuine surprise turning to excitement, hand gesture punctuating the reaction.
Hook (first 1.5 seconds): a short sharp gasp or "Wait—" before the camera settles —
creator's eyes go wide, hand comes up.
Camera: handheld medium close-up with slight reactive shake on the gasp, then
settles. Lighting: bright ring light front key, neutral background blurred.
Audio:
- Ambient: quiet room
- Foley: phone notification ping off-screen triggers the reaction
- Dialogue: "Wait — wait, wait, wait. Did that just happen?" Genuine, conversational,
not performed.
- Music: brief punchy sound effect on the gasp, then no music
Aspect: 9:16. Duration: 8 seconds. Pacing: single take, reaction carries the pacing.
9. To-Camera "How I Did X"
Medium shot of a creator at a desk, slightly more formal than casual vlog — this
is a results-reveal setup, small smile already on face, speaking directly to camera.
Hook (first 1.5 seconds): creator says "Here's exactly how I did it—" while a quick
visual flash of the result appears in top-right corner (picture-in-picture style).
Camera: medium shot, locked-off, creator slightly right of center with result visual
framed top-left. Lighting: soft box key left, clean neutral background.
Audio:
- Ambient: soft room tone
- Foley: none
- Dialogue: "Here's exactly how I did it — no fluff, just the three steps that
actually mattered." Confident, like explaining to a friend.
- Music: light upbeat background track, low in mix
Aspect: 9:16. Duration: 10 seconds. Pacing: single take.
10. To-Camera Before/Result Reveal
Split-tone talking-head: first half creator looks uncertain or tired (the "before"),
then a sharp visual cut matches a shift in posture and expression to confident
and energized (the "after") — both shots to-camera, same framing.
Hook (first 1.5 seconds): first shot starts already in the "before" expression —
slightly slouched, dull room lighting, creator says "Three weeks ago—" at low energy.
Camera: locked-off medium close-up, same framing both halves for maximum visual contrast.
Lighting: before = cooler, slightly underexposed; after = warmer, brighter key light.
Audio:
- Ambient: subtle room tone both halves
- Foley: sharp cut sound (single audio snap) on the transition between halves
- Dialogue: Before: "Three weeks ago I had no idea what I was doing." After: "Now?
This is what changed." Tone shift is dramatic but natural.
- Music: before = low ambient drone; after = punchy upbeat beat enters on the cut
Aspect: 9:16. Duration: 12 seconds. Pacing: 6s before + hard cut + 6s after.
Lifestyle & Aesthetic Prompts (11–15)
11. Coffee Shop Morning Aesthetic
Slow push-in on a ceramic coffee mug on a wooden café table, steam rising, open
notebook beside it, soft morning light through large window, background café blur.
Hook (first 1.5 seconds): the ambient sound hits first — espresso machine steaming,
low café chatter — before the visual fully resolves from a soft blur into focus.
Camera: slow cinematic push-in, vertical frame filling with the mug and notebook,
handheld micro-movement suggesting presence. Lighting: warm golden-hour window light,
soft fill from right, very low contrast, pastel color temperature.
Audio:
- Ambient: espresso machine, low café conversation, rain on window faint
- Foley: ceramic mug set down softly, pencil on paper
- Dialogue: none
- Music: warm lo-fi hip-hop, single soft piano loop, very low
Aspect: 9:16. Duration: 10 seconds. Pacing: continuous push-in, no cuts.
12. Study/Work Session Aesthetic
Top-down flat-lay slowly animating into life: hand enters frame to open a notebook,
pen begins writing, a second hand moves a laptop trackpad, coffee beside everything.
Hook (first 1.5 seconds): first sound is pen touching paper — a clean soft scratch —
before the hand fully enters the vertical frame.
Camera: strict overhead top-down, vertical 9:16 with desk objects arranged within frame,
slight handheld warmth. Lighting: diffused natural overhead, clean shadows, neutral tones.
Audio:
- Ambient: keyboard ambient, page turn, low AC hum
- Foley: pen writing, laptop trackpad click, coffee cup set down
- Dialogue: VO, quiet: "Locked in. No distractions. Let's go."
- Music: focused lo-fi beats, minimal and clean
Aspect: 9:16. Duration: 12 seconds. Pacing: continuous motion, no hard cuts.
13. Gym Workout Aesthetic
Medium shot of a person gripping barbell knurling before a clean, minimal home gym
setup, chalk dust catching the light, focus face, controlled breath visible.
Hook (first 1.5 seconds): first sound is the hard metallic clink of weight plates
settling — creator exhales audibly, hands tighten on bar.
Camera: low-angle medium shot looking slightly up, vertical frame, creator fills the
frame with deliberate physicality. Lighting: hard directional single-source light,
dramatic side shadows, industrial feel.
Audio:
- Ambient: gym echo, ventilation
- Foley: weight plate clink, chalk dust puff, controlled breathing
- Dialogue: none
- Music: hard-hitting trap beat with a punchy drop on the pull
Aspect: 9:16. Duration: 9 seconds. Pacing: single take, motion peaks at second 6.
14. Fashion Outfit-of-the-Day
Full-length fashion reveal: starts at feet on an urban sidewalk, camera tilts slowly up
the outfit in one smooth vertical movement, ending on a confident face forward.
Hook (first 1.5 seconds): shoes hit the pavement with a sharp clack — the first sound
grounds the clip before the camera starts its upward journey.
Camera: slow vertical tilt from ground to face, smooth and deliberate, 9:16 perfectly
suited for the full-length reveal. Lighting: natural overcast outdoor light, soft
and flattering, slight warm grade.
Audio:
- Ambient: urban street sound, light breeze
- Foley: heel click, fabric movement
- Dialogue: VO at the end: "Today's look. All linked below."
- Music: confident mid-tempo R&B or pop, enters on the tilt
Aspect: 9:16. Duration: 10 seconds. Pacing: single continuous tilt, no cuts.
15. Weekend Routine Aesthetic
A loose montage-feel single-take: cozy bedroom curtains opening, then cut to coffee
poured, then cut to window-seat book open — three lifestyle vignettes, each 3 seconds.
Hook (first 1.5 seconds): curtains pull open and morning light floods in — first
sound is the fabric swoosh of the curtain and immediate birdsong outside.
Camera: handheld warm and slightly imperfect each shot — intentionally human, not
polished studio. Lighting: all natural light, golden morning warmth.
Audio:
- Ambient: birds, neighborhood morning, coffee maker, page turn
- Foley: curtain pull, coffee liquid pour, book spine open
- Dialogue: none
- Music: gentle acoustic indie, warm and unhurried
Aspect: 9:16. Duration: 9 seconds. Pacing: 3 cuts at 3 seconds each, soft transitions.
Transformation & Reveal Prompts (16–20)
16. Skincare Before/After
Side-by-side style single 9:16 frame: left half shows dry, dull skin close-up in
flat cool light; a wipe transition splits the frame and the right half reveals glowing,
hydrated skin in warm soft light. Both halves are the same face, same angle.
Hook (first 1.5 seconds): the transition wipe sound — a clean sharp swipe — happens
at second 1, earlier than expected, immediately revealing the contrast.
Camera: locked-off tight close-up on face, centered then split by the wipe.
Lighting: before = cool flat overhead, slightly harsh; after = warm diffused beauty light.
Audio:
- Ambient: quiet bathroom, running water faint
- Foley: the wipe transition has a satisfying swish sound effect
- Dialogue: VO: "Same face. Different routine. This is what 30 days did."
- Music: light airy pop background, enters on the after half
Aspect: 9:16. Duration: 10 seconds. Pacing: 4s before + wipe at 1.5s + 5s after.
17. Room Makeover Reveal
A dramatic room reveal: first 5 seconds show the "before" room in dull light, slightly
messy; the creator steps through the doorway at second 5 and turns on a light to reveal
the same room completely transformed — clean, styled, warm.
Hook (first 1.5 seconds): a hand appears on the doorframe immediately — creator says
"Okay. Are you ready to see this?" already building anticipation before the reveal.
Camera: medium wide looking from hallway into room, creator in frame, door opens inward.
Lighting: before = dull ambient, slightly underexposed; after = warm lamp light,
styled throw blanket, plants visible.
Audio:
- Ambient: quiet before; after has soft warm ambience
- Foley: light switch click, door swing, gasp from creator
- Dialogue: Before: "Okay. Are you ready to see this?" After: "This is the same room.
I am not kidding."
- Music: before = low atmospheric; after = upbeat reveal sting that resolves warm
Aspect: 9:16. Duration: 14 seconds. Pacing: 5s build + 1s switch + 8s reveal walk-through.
18. Hair Transformation
Tight close-up on hair — before shot shows natural unstyled hair texture, creator
turns away from camera, then spins back with transformed styled hair, same tight framing.
Hook (first 1.5 seconds): first frame is already the "before" in tight close-up with a
voice saying "Give me 20 minutes—" as creator turns away.
Camera: locked tight close-up on hair and face, turn-away is a natural pivot spin.
Lighting: consistent bright beauty light both sides for maximum texture visibility.
Audio:
- Ambient: bathroom, hair tool hum in the middle
- Foley: hair tool click, spray hiss implied in the turn
- Dialogue: Before: "Give me 20 minutes—" [spin away] After: [spin back] "Done."
Single-word reveal, delivered with confidence.
- Music: lo-fi chill before the turn; punchy beat lands on the reveal spin
Aspect: 9:16. Duration: 11 seconds. Pacing: 3s before + turn = 2s + 6s reveal hold.
19. Organization Reveal
Tight POV-style closeup of a chaotic junk drawer or messy shelf — then a
quick-cut series of 3 one-second organizing shots — then a final 4-second reveal
of the same space perfectly organized, slow push-in.
Hook (first 1.5 seconds): opens directly on the chaos close-up, creator says
"I finally did it. This took two hours." Voice already done with the process.
Camera: handheld first 3 cuts, slightly disorienting intentionally; final reveal
is a smooth slow push-in on the organized result. Lighting: bright practical kitchen
or closet light, natural and honest.
Audio:
- Ambient: quiet domestic space
- Foley: drawer open, items sorted, container lids snapping
- Dialogue: "I finally did it. This took two hours. [cut series] But look at it now."
Warm satisfaction in the final line.
- Music: neutral during chaos; satisfying warm chord on the final reveal
Aspect: 9:16. Duration: 12 seconds. Pacing: 2s hook + 3 quick cuts (1s each) + 4s reveal push-in.
20. Fitness 30-Day Result
Side-by-side vertical split: left shows day-one selfie-style clip (plain background,
casual, relaxed posture); right shows day-30 in the same frame with visibly different
energy, posture, and presentation. Creator addresses camera directly.
Hook (first 1.5 seconds): split frame is already visible — creator in both halves says
in unison "30 days apart. Same person." Uncanny timing on the unison delivery.
Camera: locked-off selfie medium close-up, both clips same distance from camera.
Lighting: before = flat neutral; after = slightly warmer and better lit, intentional upgrade.
Audio:
- Ambient: quiet room both sides
- Foley: none
- Dialogue: "30 days apart. Same person. [pause] Here's what actually changed."
Both halves speaking the opening line, then only the "after" continues.
- Music: motivational lo-fi hip-hop, tempo builds slightly through the clip
Aspect: 9:16. Duration: 12 seconds. Pacing: unison opening 3s + single-voice 9s.
Listicle & Education Prompts (21–25)
21. "3 Things You Didn't Know"
Talking-head with fast visual punctuation: creator to-camera delivers one line,
then a sharp cut to a graphic/text overlay on a clean background for 2 seconds,
then back to creator for the next line — 3 items, 3 cuts.
Hook (first 1.5 seconds): creator opens with "Three things about [topic] that
nobody tells you—" leaning forward, one finger raised before any overlay appears.
Camera: medium close-up on creator, locked-off; graphic inserts are full-frame.
Lighting: bright, clean, creator key-lit from left.
Audio:
- Ambient: quiet neutral room
- Foley: text-reveal sound effect on each graphic insert (clean tap)
- Dialogue: "Three things about [topic] that nobody tells you— [cut] Number one:
[item]. [cut] Number two: [item]. [cut] Number three: [item]. Save this one."
Fast, confident, no filler between items.
- Music: upbeat background track, stays consistent through cuts
Aspect: 9:16. Duration: 15 seconds. Pacing: 3s intro + 3 × (1s creator line + 2s graphic) + 3s close.
22. "Ranking [X]" Overlay Format
Tight product/object shots with bold rank overlays appearing in frame —
creator's hands interact with each item as the rank animation slides in.
Hook (first 1.5 seconds): a hand drops the first item into frame from above with
a satisfying thud — the rank "1" overlay slides in from right immediately.
Camera: overhead flat-lay or tabletop close-up, items entering from the edges of
the vertical frame. Consistent clean background throughout.
Lighting: flat, even, product-photography style, no harsh shadows.
Audio:
- Ambient: quiet neutral
- Foley: each item placed has a distinct appropriate sound (thud, clink, rustle)
- Dialogue: VO, confident: "Ranking these from worst to best — no sponsorships,
just my honest take." One line per item as ranks reveal.
- Music: building lo-fi beat that gets more confident as rankings go higher
Aspect: 9:16. Duration: 12 seconds. Pacing: 4 items × 3 seconds each with rank overlay.
23. "How To" 5-Step Demo
Close-up demo of hands performing a skill or technique — 5 distinct steps, each
captured in a tight shot, with creator VO walking through each step cleanly.
Hook (first 1.5 seconds): hands already in motion on step 1, creator's VO starts
mid-action: "Step one — you do this while the [thing] is still warm."
Camera: close-up on hands and subject matter, handheld with slight follow.
Occasional cut back to creator face for steps that need context. Lighting: bright,
practical, hands visible with no harsh shadows.
Audio:
- Ambient: appropriate environment (kitchen, desk, workshop)
- Foley: distinct sounds for each step — no step should be silent
- Dialogue: VO for all 5 steps: numbered, direct, no filler. Total under 10 seconds
of speech.
- Music: light background lo-fi, consistent across all cuts
Aspect: 9:16. Duration: 14 seconds. Pacing: 5 cuts, roughly 2-3 seconds per step.
24. Myth-Busting One-Liners
Creator to-camera delivers myth busts in fast succession — each myth appears as
a text overlay being crossed out, followed by creator's correction.
Hook (first 1.5 seconds): first myth overlay appears instantly and creator already
says "Wrong." — no intro, no buildup, starts in the middle.
Camera: medium close-up, locked-off, creator slightly right of center so text
overlays can anchor to the left of the vertical frame.
Lighting: clean bright studio-style or window light, no distractions.
Audio:
- Ambient: nearly silent — sharpness of silence makes the delivery hit harder
- Foley: strike-through sound on each myth text, crisp
- Dialogue: myth 1: "Wrong. [correction one-liner]." myth 2: "Also wrong.
[correction one-liner]." myth 3: "Completely wrong. Here's what's actually true."
Delivered with dry confidence, not aggressively.
- Music: no music — silence serves the format
Aspect: 9:16. Duration: 13 seconds. Pacing: 3 myths × ~4 seconds each, tight cuts.
25. Comparison Side-by-Side
Vertical frame split cleanly into two halves for a direct comparison — Option A
on left, Option B on right — creator narrates above both with VO.
Hook (first 1.5 seconds): both halves appear simultaneously in the first frame
with a sharp split-screen sound — creator says "These two things are not the same."
Camera: each half has its own shot (product close-up, app screen, food, etc.) —
cuts within each half happen in sync. No camera movement — locked off both sides.
Lighting: matched between sides for fair comparison — no visual bias built into the light.
Audio:
- Ambient: neutral
- Foley: side-specific sounds — if one side is better, it sounds better too
- Dialogue: VO: "These two things are not the same. [left side detail]. [right side
detail]. The difference is [key point]. One of these is worth it. One isn't."
- Music: neutral, consistent track under both sides
Aspect: 9:16. Duration: 13 seconds. Pacing: 2-second simultaneous reveal + 11 seconds split narrative.
Story-Arc / Hook Prompts (26–30)
26. Surprising Hook to Reveal
Opens on an unexpected, slightly confusing visual — no context given — then pulls
back or cuts to reveal what the visual actually is. The reveal recontextualizes
the opening completely.
Hook (first 1.5 seconds): an extreme close-up of something unidentifiable,
strange texture or color — creator says "Guess what this is." Camera very tight,
almost abstract.
Camera: starts extreme close-up (macro-style), pulls back to a medium wide revealing
the ordinary object this came from. The reveal is the whole joke.
Lighting: consistent through both shots, natural or practical.
Audio:
- Ambient: appropriate for the revealed object
- Foley: a satisfying reveal sound — zipper, lid pop, or just natural audio
- Dialogue: "Guess what this is. [3 seconds of visual suspense] Yeah. That's a
[ordinary object]. I'm sorry." Deadpan delivery.
- Music: tense mini-riff during the close-up; comedic resolution note on the reveal
Aspect: 9:16. Duration: 11 seconds. Pacing: 3s mystery + 1s pull-back + 7s reveal.
27. Problem → Solution → CTA
Three-beat structure: first beat is creator naming a relatable problem (3s),
second beat is the solution delivered with specificity (6s), third beat is
a direct to-camera CTA (3s). All talking-head, all to camera.
Hook (first 1.5 seconds): creator opens mid-frustration face and says
"You're doing it backwards. Here's why nothing's working—"
Camera: locked-off medium close-up through all three beats — no cuts,
no B-roll. The verbal structure carries the pacing.
Lighting: consistent clean beauty/key light throughout.
Audio:
- Ambient: quiet room
- Foley: none
- Dialogue: Problem: "You're doing it backwards. Here's why nothing's working—
[names specific problem]." Solution: "The fix is [specific action]. Do it before
you do anything else." CTA: "Try it tonight and tell me what happened."
Conversational, urgent, specific.
- Music: no music — voice is the full track
Aspect: 9:16. Duration: 12 seconds. Pacing: 3s problem + 6s solution + 3s CTA, no cuts.
28. "Wait Until You See This" Mid-Reveal
Creator builds suspense mid-clip — showing partial progress toward a result that
isn't yet visible, using pacing and audio to prevent skip — then the reveal hits
at second 10 of a 13-second clip.
Hook (first 1.5 seconds): creator holds something partially out of frame and says
"Wait, wait — wait until you see this." Object not yet revealed, tension in voice.
Camera: handheld medium, creator fills frame, object obscured by the edge of the
vertical frame. Then camera pulls wide or object enters frame fully on reveal.
Lighting: warm practical light, slight mystery in the shadows before reveal.
Audio:
- Ambient: quiet with slight tension
- Foley: rustling, footsteps building to reveal
- Dialogue: "Wait, wait — wait until you see this. [6 seconds of build-up narration]
Okay. Look." Final word is calm after the energy build. The contrast is the payoff.
- Music: subtle tension build resolving to a warm sting on reveal
Aspect: 9:16. Duration: 13 seconds. Pacing: 2s hook + 8s build + 3s reveal.
29. Narrative One-Take
A single uninterrupted take in which the creator walks through a small physical space,
telling a micro-story as they move — the movement mirrors the narrative arc.
Hook (first 1.5 seconds): creator is already walking, already mid-sentence — we
join the story in progress, which creates instant momentum.
Camera: handheld following the creator, sometimes slightly ahead, sometimes
slightly behind — natural documentary feel, vertical.
Lighting: practical room lighting throughout — lamps, windows, whatever exists in
the space. Imperfection is intentional.
Audio:
- Ambient: layered — each room has its own ambient that changes as creator moves
- Foley: footsteps, door opening, object picked up and set down
- Dialogue: "—and that's when I realized it. [pause, opens a drawer] This was the
problem the whole time. [holds up object] One thing. That's it. One thing."
Measured, present, not rushed.
- Music: no music — ambient and dialogue only
Aspect: 9:16. Duration: 14 seconds. Pacing: single take, movement drives the edit.
30. Plot-Twist Micro-Story
A three-act micro-narrative in 12 seconds: setup (creator states a confident
assumption, 4s), apparent confirmation (something seems to prove them right, 4s),
twist (the opposite is true, 4s). Clips are tight cuts, each 4 seconds.
Hook (first 1.5 seconds): the first shot is already the confident assumption delivered
with certainty — creator says "I was right about this one. 100%." No hesitation.
Camera: shot 1 = medium close-up to-camera; shot 2 = close-up on evidence;
shot 3 = back to creator, expression collapsed into disbelief.
Lighting: shots 1 and 2 are warm and certain; shot 3 is fractionally cooler to
support the emotional shift.
Audio:
- Ambient: neutral room
- Foley: a single notification or alert sound on the twist cut
- Dialogue: shot 1: "I was right about this one. 100%." shot 2: [VO] "Look at this —
exactly what I predicted." shot 3: "I was so, so wrong. Don't be me."
Final delivery is self-aware, not defeated.
- Music: confident lo-fi during shots 1-2; comedic deflation note on shot 3
Aspect: 9:16. Duration: 12 seconds. Pacing: 3 cuts × 4 seconds each, sharp transitions.
Shorts & TikTok Power Tips
Lock 9:16 in every prompt. Don't let it default. Veo 3 can generate multiple aspect ratios — if you don't specify Aspect: 9:16 explicitly, you may get widescreen output that gets cropped by the platform or displayed with black bars. Every prompt should include it.
Describe the first shot AND the first sound. The hook in the first 1.5 seconds is a two-channel event: visual and audio. A loud sizzle, a sharp voice line already mid-sentence, a bass hit — describe both. The algorithm tracks early drop-off, and Veo 3's native audio synthesis means the audio hook is part of the prompt, not an afterthought.
Direct audio explicitly — ambient, foley, dialogue, music each separately. Veo 3 synthesizes all four audio layers as part of the generation. "Add some music" is too vague. Specify: ambient room tone, foley sounds for specific actions, exact dialogue in casual creator language, and music genre/vibe/tempo. Prompting each layer separately produces richer results.
Target 6–15 seconds for Shorts, 7–21 for TikTok. Both platforms reward high completion rates over length. A 6-second clip watched 100% through outperforms a 30-second clip abandoned at second 10. For hook formats, target 6–9 seconds. For story-arcs and demos, 12–15 seconds for Shorts, up to 21 for TikTok.
Use creator-voice dialogue, not ad-voice. There is a clear difference between "Discover the transformative power of our solution" and "I cannot believe this actually worked." Platform audiences are calibrated to skip ad-language instantly. Dialogue in your prompts should read like a real person talking, casual and direct, with natural pauses and imperfect word choices.
End with a loop hook or a CTA. Shorts and TikTok reward repeat views. A clip that ends with a visual or audio beat that makes sense as a beginning again (loop-friendly ending) or ends with a direct CTA — "Try it and tell me what happened," "Watch this again and you'll see it" — performs better than a clip that just stops.
Make a TikTok video about morning routines.
First-person POV morning routine, 9:16 vertical. Hook (first 1.5 seconds): alarm dismissed with a hand swipe — first sound is the phone tap followed immediately by birdsong from outside. Camera: handheld first-person moving through bedroom to window, natural shoulder motion. Lighting: warm bedroom lamp transitioning to natural morning light as blinds open. Audio: Ambient: birds outside, faint neighborhood noise. Foley: phone screen tap, bare feet on hardwood, blind cord pull. Dialogue: VO, warm: "Every day starts here. Make it count." Music: gentle acoustic, enters softly after VO line. Aspect: 9:16. Duration: 12 seconds. Pacing: continuous single take, three natural movements.
Start Creating Short-Form Video With Veo 3
The clips that win on Shorts and TikTok are not the most polished — they're the ones that hook in the first second, hold through the audio, and give the viewer a reason to loop or save. These 30 prompts are engineered for that mechanic, not for a film festival.
Use the AI prompt generator to build custom Veo 3 prompts for your specific content, audience, and platform. For more Veo 3 prompts across cinematic, documentary, and commercial formats, see the best Veo 3 prompts collection. For a full breakdown of how to structure audio direction, camera control, and character consistency in Veo 3 prompts, read the Veo 3 prompt guide and the complete AI video prompting guide.
Note on disclosure: Both YouTube Shorts and TikTok have policies around AI-generated content labeling. Requirements vary by region and content type — check current platform guidelines before publishing AI-generated clips, especially for content that could be mistaken for real footage.