Skip to content
CONCEPTS
4 min readЧитать на русском

A Workflow for Generating Seedance 2 JSON Prompts

Every Seedance 2 project tends to invent its own workflow. One team checks references first, another writes the global block last, a third forgets to pre-flight face-heavy stills until the content filter trips on submission. Under deadline, the steps that get dropped are usually the cheap-but-easy-to-skip ones — image pre-flight, character budget audit, scene-mapping coherence — and they cost the most when they fail late in the pipeline.

What follows is a documented version of that workflow, so the steps don't depend on which project the operator happened to do last.

The shape is end-to-end: scenario plus reference stills go in, a production-ready Seedance 2 JSON prompt comes out. It's the companion to the seedance-2-cinematic-video-prompt-engineer system prompt — the prompt does the heavy lifting; this workflow surrounds it with the discipline (image pre-flight, scene mapping, character budget, final validation) so the prompt's output lands clean every time.

Inputs

  1. Scenario — text description of the commercial / video (any language). May be loose prose or a pre-broken-down shot list.
  2. Reference images — numbered stillshots representing key frames. Filenames like img1, 1.png, shot_01.png.
  3. (Optional) overrides — custom char limit, target shot count, locked wardrobe / location notes.

Output

A single raw JSON object (no fences, no prose) conforming to the structure documented in seedance-2-cinematic-video-prompt-engineer:

{"refs":[...],"g":"...","s":[{"id":"1","c":"...","p":"..."}]}

Process

0. Image pre-flight (censorship / face filter)

Before anything else, inspect every reference image for trigger content:

  • Real faces, portrait placards, photo walls, dense crowds with visible faces → apply Layer 1 grid overlay per seedance-2-censorship-bypass (white 100%-opaque grid, 6×6 @ 12px default; escalate to 10×10 @ 8px after a flag).
  • Reference already carries a grid-like overlay (production chroma placards with + tracking marks)? Verify it qualifies as a white opaque grid; if not, lay a proper one on top.
  • Sensitive scenario (memorial, war, protest, historical-political)? Apply the grid even when faces are secondary.
  • Flag in refs[].r that the image is gridded so the model is not confused — e.g. "white 6×6 grid overlay for detector bypass; ignore in output".

Pre-commit to Layer 2 hygiene (applied in step 7):

  • No age-signal words, no emotions or backstory, no named public figures.
  • Every p is visual facts + scene context + production language + role-over-age.
  • If a grid overlay was applied, scene prompts include no grid lines, no overlay, no mesh, clean skin, smooth image.

1. Check inputs

  • Scenario present? If missing, ask for it.
  • Images attached? Count them and note filenames as provided.
  • Clarify any override (char cap, shot count, specific VFX language) before generating.

2. Load the system prompt

Use seedance-2-cinematic-video-prompt-engineer verbatim as the system layer for the generation step.

3. Analyze the scenario

  • Identify scenes by camera setup change — a new scene starts on a cut or a discrete camera move.
  • Flag: freeze moments, VFX states, composited overlays (text/UI/logos), loops.
  • Lock: wardrobe, location, character descriptions.

4. Analyze every reference image

For each image, extract:

  • Wardrobe (colors, cuts, accessories)
  • Interior / location elements
  • Hand positions, props, gestures
  • VFX style if present (grid / pixel / wireframe / particle, color, coverage)
  • Camera angle and framing
  • Color grade / lighting mood

Pick the PRIMARY reference (usually the character + location anchor) — it maps to the most scenes.

5. Map images → scenes

Build refs[] first. Each image gets:

  • img — exact filename
  • s — CSV of scene IDs it applies to
  • r — ≤ 80-char match descriptor

6. Write global (g)

≤ 300 chars. Must cover: composited elements, wardrobe lock, location lock, VFX rules.

7. Write scenes (s[])

Per scene:

  • c — camera-only shorthand, ≤ 80 chars. Use the full verb palette (ROCKET, whip, CRASH stop, orbit, corkscrew, bullet-time, …).
  • p — visual frame content only, ≤ 250 chars. No camera repetition. Explicit freeze / VFX scoping.

8. Count characters

If total JSON > cap (default 3500):

  • Compress p fields first (dense abbreviations).
  • Merge adjacent scenes with similar camera + content.
  • Shorten g (but never remove the composited-elements note).

9. Final pass

  • English throughout (translate from any source language).
  • Proper nouns / brands preserved.
  • No markdown, no fences, no commentary.
  • Valid JSON (closed brackets, escaped quotes).

10. Deliver

Paste raw JSON only. Offer: "Want me to iterate on any shot, tighten the cap, or add a scene?"

Common variations

  • Loop video — the last scene's p must state "final frame matches shot 1 first frame" and the camera in c should reverse the opening move.
  • Freeze sequences — every frozen scene states pose + "Zero movement."
  • Text-overlay heavyg explicitly lists "empty comp spaces"; scenes note where those spaces live in frame.
  • Multi-character — each character gets one locked wardrobe line in g; refs with multiple people map to most scenes as PRIMARY.

Anti-patterns to avoid

  • Describing camera motion inside p — put it in c.
  • Writing the text content of composited titles — never.
  • Adding "type":"cut" or other schema extensions — not part of the contract.
  • Letting VFX bleed onto characters without explicit "chars CLEAN" scoping.
  • Forgetting to mark the PRIMARY reference.
  • Skipping image pre-flight on face-heavy refs — most filter fails are preventable at upload time (see seedance-2-censorship-bypass).
  • Age words in pyoung, child, elderly, etc., raise scrutiny on the entire prompt.
  • Emotional framing — "remembering", "sad", "hopeful" → replaced with visual facts.

Pairs with