Seedance 2.0 Prompting Guide: 14 Battle-Tested Templates for UGC Video Ads

After thousands of production Seedance 2.0 generations, here are the 14 prompt templates that consistently produce conversion-grade UGC — organized by mode, with the wording shifts that move output from "obviously AI" to genuinely scroll-stopping.

Seedance 2.0 is more literal than Sora 2 or Veo 3.1. It will give you exactly what you described — and exactly what you forgot to describe. After running thousands of Seedance generations through UGC Copilot's production pipeline, we've collected the prompt patterns that consistently produce conversion-grade UGC, and the ones that consistently waste credits. This guide is the prompting playbook: 14 templates organized by mode, plus the small wording shifts that take output from "obviously AI" to "I'd scroll past this thinking it was real."

If you haven't yet read our hands-on review of Seedance 2.0, start there for the architecture. This post assumes you already understand the three modes (image-to-video, reference-to-video, text-to-video) and the face policy on image inputs.

Why Seedance 2.0 Prompts Are Different

Three things distinguish Seedance prompting from prompting Sora 2, Veo 3.1, or Kling O3:

It is more literal. Sora bakes in cinematic defaults — soft lighting, shallow depth of field, slow camera moves. Seedance does not. If you don't specify mood, it produces something flat and lifeless. If you do specify it, it generally nails it on the first attempt.
The mode you pick changes how prompts read. A text-to-video prompt has to describe the character; a reference-to-video prompt should not (the reference image carries that load). Mixing the two — describing a character in detail when you've also passed a reference image — produces drift between scenes.
Native audio means dialogue cues matter. With generate_audio: true, Seedance synthesizes a soundtrack from your prompt. Vague audio language ("upbeat") produces stock-music-flavored output. Specific audio language ("single-take phone recording, ambient room tone, no music") produces UGC-grade audio.
15-second duration changes pacing. A 5s clip survives a thin prompt. A 15s clip exposes every gap. The longer your duration, the more beats you have to write into the prompt, or the model fills the silence with awkward looping motion.

The Universal Seedance Prompt Structure

Every prompt template below follows the same six-part skeleton. Use it as a checklist when you're writing from scratch:

Character or subject — Who or what is on screen. Skip in reference-to-video mode.
Wardrobe / styling / packaging — Specific, not adjectival. "Cream-colored crewneck sweatshirt" beats "casual outfit."
Environment — Where the scene is and what time of day. Tight environments outperform vague ones.
Action and dialogue — What happens, in what order, with what spoken lines.
Camera and framing — Lens type, distance, movement, and orientation. "Phone front camera selfie framing" is the cheat code for UGC.
Mood, audio, anti-defects — Emotional register, sound cues, and explicit exclusions ("exactly two arms, five fingers per hand").

Text-to-Video Templates (Character-Driven UGC)

These are the workhorses of AI UGC ads. Text-to-video is the only Seedance mode that supports human characters reliably (the image and reference modes trigger ByteDance's face policy). Each template below is one we've validated in production.

1. The UGC Talking-Head Hook

For the first 5–8 seconds of an ad. The job is to stop the scroll.

"Mid-twenties woman with shoulder-length brown hair, no makeup, wearing a cream crewneck sweatshirt, standing in a softly lit kitchen at golden hour. She looks slightly off-camera then directly at the lens and says with a half-smile, 'Okay I have to tell you about this — I genuinely cannot believe how well this worked.' Phone front camera selfie framing, slight handheld micro-movement, ambient room tone, no music. Exactly two arms, five fingers per hand. Single-take recording, no cuts."

Why it works: Specific wardrobe and lighting kill the "stock model" look. The half-second look-off-camera before address mimics the pacing of real phone recordings. The "no music" instruction is critical — without it Seedance often drops in default ambient music that immediately reads as commercial.

2. The Pain-Point Confession

For ads opening on a problem the viewer recognizes in themselves.

"Late-twenties man with short dark hair, beard stubble, wearing a charcoal hoodie, sitting on a low couch in a dim apartment with one warm lamp on. He sighs, rubs his face once, looks at the camera and says quietly, 'I tried everything for my back — chiropractor, stretches, that twenty dollar foam roller — nothing worked.' Eye-level phone framing, slow handheld drift, faint apartment ambient hum, no music. Exactly two arms, five fingers per hand."

Why it works: The single sigh and face-rub before speaking is the body-language anchor that Seedance interprets as "vulnerable, real." The dim lighting tells the model "this is a confession, not a pitch." Reference these patterns alongside our breakdown of viral hooks that convert for the full hook-writing framework.

3. The Unboxing Reveal

For ads where the product is the surprise. Seedance handles this well at 10–15s durations.

"Early-thirties woman with curly auburn hair, wearing a fitted gray t-shirt, sitting at a sunlit white desk. A small kraft-paper box sits in front of her. She slides a finger under the seal, pulls it open, lifts out [PRODUCT], turns it slowly in her hands and laughs once, looking up at the camera. 'Okay this is way nicer than I thought it would be.' Phone front camera selfie framing tilted slightly down. Ambient morning room sound, no music. Exactly two arms, five fingers per hand."

Why it works: The single laugh anchors a real micro-expression and gives Seedance a beat to lock onto. The downward camera tilt mimics the natural angle when you're filming yourself with one hand and unboxing with the other.

4. The Before / After Demonstration

For 15-second clips where the same character appears in two states.

"Late-twenties woman, blonde hair pulled back, no makeup, wearing a navy tank top, in a bright bathroom mirror. First seven seconds: tired posture, slight frown, voice flat: 'My skin used to look like this every morning.' Then she briefly looks down. Last eight seconds: she straightens, brushes hair from her face, smiles warmly into the mirror, voice brighter: 'Now I actually look forward to the mirror.' Phone front camera framing held in left hand. Bathroom ambient sound, no music. Exactly two arms, five fingers per hand. One continuous take."

Why it works: Seedance natively handles temporal beats when you write them as a literal timeline ("first seven seconds... last eight seconds..."). Most other video models lose state transitions inside a single generation; Seedance preserves them surprisingly well at 15s.

5. The Reaction-Style Endorsement

The "I just had to share this" moment that powers most TikTok UGC.

"Early-twenties man with messy dark hair, wearing a white t-shirt, sitting on the edge of a bed with morning light coming through a window. He's holding [PRODUCT], looking down at it with raised eyebrows. He looks up at the camera, half-laughs, and says, 'Wait — wait. I have to actually show you this. This is genuinely insane.' He turns the product slightly toward the lens. Phone front camera selfie framing held at arm's length. Ambient bedroom morning sound, no music. Exactly two arms, five fingers per hand."

Why it works: The "wait — wait" interrupted opener mimics a specific TikTok creator pattern that the language model has clearly trained on, and it produces the most authentic-feeling delivery cadence we've measured in side-by-side tests.

Image-to-Video Templates (Faceless / Product Content)

These templates use a still image as the literal first frame. Per ByteDance's content policy, the image cannot contain a human face — so this mode is reserved for product shots, environments, food, and objects. For background on the policy, see our Seedance 2.0 review.

6. The Product Hero Orbit

Input image: clean product photo on a neutral background.

"Camera slowly orbits 90 degrees around [PRODUCT], maintaining the product centered in frame. Soft studio key light from the upper left, gentle fill from the right. Background remains a clean off-white seamless. Subtle shadow movement on the surface beneath the product as the camera passes. Quiet ambient room tone, no music."

Why it works: Specifying the orbit angle (90 degrees, not 360) prevents the over-spin that makes AI orbits look stock-3D. The "subtle shadow movement on the surface beneath" is what locks the product to a real space rather than letting it float.

7. The Pour / Application Shot

For liquid, cream, powder, or anything with a textural moment. Seedance is exceptional at this.

"Slow-motion pour of [PRODUCT] from the bottle into a small ceramic cup on a marble countertop. Soft window light from the left creates highlights on the liquid stream. Steam rises gently from the cup. Camera holds steady at 45-degree angle. Audible pour and slight clink as the bottle taps the cup. No music."

Why it works: Seedance's physics simulation for fluid is one of its strongest features. The native audio adds the pour sound and clink without needing post-production foley work.

8. The Texture Close-Up Macro

For products where surface detail is the value prop — fabric, food, skincare, packaging.

"Macro shot moving slowly across the surface of [PRODUCT], revealing texture detail. Shallow depth of field with focus shifting from foreground to background as the camera drifts left to right over 8 seconds. Soft daylight, no harsh shadows. No music, faint ambient hum."

Why it works: Macro work plays to Seedance's strength on subtle motion. The explicit focus-shift instruction is what produces the hand-camera macro feel rather than a flat, even-focus product render.

9. The In-Use Lifestyle Shot (Faceless)

For showing the product in context without including a face.

"Hands (visible from elbows down only) holding [PRODUCT] over a wooden kitchen counter. Hands rotate the product gently, then set it down beside a coffee mug and a notebook. Morning window light from camera-left. Ambient kitchen sound, faint coffee maker hum, no music. Exactly two hands with five fingers each."

Why it works: Cropping to elbows-down keeps the scene human without triggering the face policy. Always include the hand-anatomy disclaimer when hands are on screen — it cuts six-finger artifacts dramatically.

Reference-to-Video Templates (Aesthetic Matching)

Reference-to-video uses an image as a style reference rather than a first frame. The model extracts aesthetic qualities — color palette, environment style, lighting — and generates a new video inspired by them. This mode triggers the face policy if the reference contains a person, so use it for environments and product aesthetics only.

10. The Branded Environment Walkthrough

Input image: a reference photo of a branded space, color palette, or design system.

"Slow camera dolly forward through a clean modern workspace matching the reference aesthetic. Natural light through floor-to-ceiling windows. Camera moves at walking pace for 10 seconds, gently revealing different areas of the space. Ambient office sound, distant typing, no music."

Why it works: The reference image carries the visual signature; the prompt only has to describe motion and pacing. Don't over-describe colors or styling in the prompt — that produces conflict with the reference.

11. The Aesthetic-Match Lifestyle Scene

Input image: a moodboard reference of the lifestyle world your brand wants to live in.

"Lifestyle scene matching the reference aesthetic — same color palette, same lighting style, same level of polish. [PRODUCT] sits prominently on the central surface. Subtle ambient motion: leaves moving outside a window, steam rising from a nearby mug, fabric shifting in a light breeze. 8-second hold with minimal camera drift. Ambient natural sound, no music."

Why it works: "Subtle ambient motion" prevents the static-photo feel that plagues reference-mode generations. Three small motion sources is the sweet spot — fewer feels frozen, more feels chaotic.

12. The Color-Palette Replication Shot

For brand-consistent environments where the color story is non-negotiable.

"Camera pans slowly across a tabletop arrangement matching the reference color palette exactly. [PRODUCT] is the focal point in the center third of the frame. Other elements (a folded cloth, a small plant, a notebook) populate the edges in coordinating tones. Soft overhead light, gentle shadow movement. 6-second pan, no music."

Why it works: Reference-to-video is significantly better at color replication than text descriptions of color. If your brand has a strict palette, this is the right mode for B-roll work.

Multi-Scene Connector Prompts

Most production UGC ads stitch 2–3 Seedance scenes together. The two prompts below are written as scene-to-scene bridges — they explicitly reference the previous scene's ending state to keep continuity.

13. The Hook → Body Continuity Bridge

For Scene 2 of an ad, where Scene 1 was a hook.

"Same character as previous scene (mid-twenties woman, brown hair, cream crewneck), now sitting at the same kitchen counter holding [PRODUCT]. She has just finished the previous sentence and continues smoothly: 'So here's exactly what it does for me…' She turns the product toward the camera, demonstrating its key feature. Same warm kitchen lighting, same ambient sound profile. Phone front camera selfie framing held in left hand. Single-take feel, no cuts. Exactly two arms, five fingers per hand."

Why it works: Repeating the character description and lighting cues anchors continuity. The "she has just finished the previous sentence" framing keeps Seedance's voice cadence aligned with where Scene 1 left off.

14. The Body → CTA Bridge

For the final scene of an ad. Closes with the call to action.

"Same character, same environment, holding [PRODUCT] now slightly closer to the camera. She smiles, exhales once, and says, 'Honestly, I cannot recommend this enough — link in my bio, you'll thank me.' She gives a small wave. Phone front camera selfie framing. Ambient room tone, no music. Slight handheld micro-movement. Exactly two arms, five fingers per hand. End with her smiling and holding the pose for the final second."

Why it works: The "end with her smiling and holding the pose" instruction is what gives you the clean trailing freeze-frame that Seedance otherwise drops. Without it, the clip ends mid-motion and feels abruptly cut.

Five Phrases That Transform Seedance Output

These are the highest-leverage phrases we've identified across thousands of generations. Drop them into any prompt and quality jumps measurably:

"Phone front camera selfie framing" — Flattens depth of field, tightens crop, kills the "polished e-commerce model" look that destroys UGC authenticity.
"Slight handheld micro-movement" — Adds the 1–2px-per-frame drift that distinguishes a phone recording from a tripod shot. Without it, talking-head Seedance clips feel like Zoom recordings.
"Single-take recording, no cuts" — Tells the model to maintain continuity within the clip. Without it, Seedance occasionally introduces internal cuts that break realism.
"Ambient room tone, no music" — Critical when generate_audio: true. Default audio leans toward stock music; this phrase steers it to natural ambient sound that reads as authentic.
"Exactly two arms, five fingers per hand" — The single highest-impact anti-defect instruction. Cuts hand artifacts by roughly 70% in our internal A/B tests. Always include when hands are on screen.

Phrases That Wreck Seedance Output

Equally important — the patterns that consistently produce bad output:

Adjective stacking. "Beautiful, stunning, gorgeous, breathtaking woman in an amazing, incredible kitchen" produces flat output. Seedance reads adjective stacks as low-information; replace with concrete specifics.
Vague mood words. "Cinematic," "epic," "stunning," "premium" — these mean different things to different models and produce inconsistent output. Substitute concrete specifics: lighting direction, color temperature, camera distance.
Conflicting style cues. "Cinematic high-end commercial feel with raw authentic phone-recorded UGC look" pulls the model in two directions. Pick one register and commit.
The negative_prompt parameter. The Seedance API silently fails when negative_prompt is supplied — the job submits, status reports COMPLETED, then result fetch returns 422. Bake avoidance language directly into the main prompt instead. We covered this in detail in our hands-on review.
Over-describing characters in reference-to-video mode. The reference image carries the visual load. If you also describe the character in detail, you'll get scene-to-scene drift as the prompt and reference compete.

The Production Iteration Loop

One generation is never the final answer. The loop we use in production:

Write the prompt to the six-part skeleton above. Include all anti-defect lines. Set duration based on dialogue length: ~10s for one full sentence with breath, 15s for two sentences with a beat.
Generate two variants in parallel with the same prompt. Seedance is deterministic enough that minor differences come from sampling, not prompt; this gives you a real A/B without a second prompt-write.
Score on three axes: motion realism, voice cadence, and hand/anatomy artifacts. Pick the better take.
If both fail on the same axis, fix the prompt: add specificity to the failed area. If motion is stiff, add "slight handheld micro-movement" or describe a specific physical action. If voice feels flat, rewrite the dialogue to include a beat (a sigh, a half-laugh, a "wait").
If both fail in different ways, regenerate twice more. The prompt is fine; you got unlucky on sampling.

Budget roughly 30–40% regeneration rate when prompting Seedance from scratch, and roughly 10–15% once you've validated a template against a specific product. This regeneration tax is the single biggest argument for batching variants — see our breakdown of scaling TikTok ads with AI UGC for the full economics.

How UGC Copilot Generates Optimal Seedance Prompts Automatically

The 14 templates above are what we baked into UGC Copilot's prompt builder. The platform handles four things you'd otherwise do manually:

Mode routing. If your scene is faceless, UGC Copilot generates an image-to-video prompt with a hero-orbit or pour-shot template. If it's character-driven, it generates a text-to-video prompt with the appropriate UGC template. If a face policy 422 fires, it auto-falls-back to text-to-video without you noticing.
Persona consistency. When you use an AI Twin, the same character description, wardrobe, voice, and styling get injected into every scene prompt — so a 3-scene ad doesn't drift between takes.
Anti-defect injection. The hand-anatomy line, the "no music" instruction, the "phone front camera selfie framing" cheat code — all baked in by default. You can override but you don't have to remember them.
Multi-engine prompt translation. If you switch a scene from Seedance to Sora 2 or Veo 3.1, UGC Copilot rewrites the prompt to that engine's preferred grammar. Sora prefers more explicit camera language; Veo prefers tighter mood descriptors. The same scene script produces an engine-appropriate prompt automatically.

Conclusion

Seedance 2.0 rewards specificity. The prompts that fail are vague; the prompts that succeed read like a director's shot list. The 14 templates above cover roughly 90% of the UGC ad scenarios we see in production — start with the closest template, swap in your product and persona, layer in the five high-leverage phrases, and the regeneration rate drops fast. For the broader workflow context, the Seedance 2.0 UGC ad workflow tutorial shows how these prompts fit into a full multi-tool stack.

Frequently Asked Questions

How long should a Seedance 2.0 prompt be?

Roughly 60–120 words for a single scene. Shorter prompts (under 40 words) produce generic output; longer prompts (over 200 words) introduce internal contradiction. The six-part skeleton naturally lands in this range.

Should I include the dialogue inside the prompt or just describe what's said?

Include the literal dialogue in quotes. Seedance synchronizes lip movement and voice cadence to the actual line, so vague descriptions ("she pitches the product enthusiastically") produce mouth movement that doesn't match real speech patterns.

Can Seedance 2.0 do multiple characters in one prompt?

Yes, but it's the highest-failure scenario. Two characters in one frame triggers anatomy issues — limbs blending, face swapping during turns. If you need two characters interacting, generate them in separate clips and stitch in CapCut, or use a different engine for that specific shot.

Does Seedance respect named brands or trademarked products in the prompt?

It generally respects them visually but will not output recognizable logos with full fidelity. For branded packaging, use image-to-video mode with a real product photo as the first frame — the model preserves logo geometry far better when the brand is in the input image rather than the prompt text.

What's the single biggest mistake new Seedance prompters make?

Forgetting the audio instruction. With generate_audio: true as the default, Seedance defaults toward adding light background music, which kills the UGC authenticity signal. Always explicitly write "ambient room tone, no music" unless music is genuinely what you want.