Skip to content
CONCEPTS
3 min readЧитать на русском

Why Storyboards Come Before Renders

In a generative-video pipeline that has to produce many shots that hang together as one piece, storyboards are not decoration. They are the cheap visual lock on composition and camera language — a stage that catches expensive mistakes before the renders happen, and whose outputs then serve as visual references for those renders.

The storyboard style constant

Every storyboard frame across every project uses the same style prefix:

"traditional hand-drawn storyboard frame on white paper, clean black ink line art with selective red accent lines for key action and focal points, minimal shading using hatching, professional film storyboard sketch style, no color except black white and red"

The prefix is prepended to every storyboard prompt in both fresh-generation and refinement modes. It's the canonical case of the style-prefix architecture: a fixed prefix turns a class of outputs into a neutral visual vocabulary downstream stages can read for composition, not style.

Fresh generation (batch)

Input: a full shot list with seven-element descriptions (subject + action, location, framing, angle, movement, lens, lighting), plus optional reference images from earlier frames in the same project, plus the scenario for narrative cohesion.

Process:

  1. If reference frames are attached, the LLM analyzes their line weight, hatching direction, and selected camera angles.
  2. It then intentionally chooses different angles for new shots to force variety — no two consecutive shots share identical framing.
  3. Line style, red-accent placement, and hatching density are held constant across the series.

Output is one storyboard prompt per shot, keyed by shot name (1A, 1B, 2A, …), each rendered into a hand-drawn frame.

Refinement (single shot)

The single-shot refinement mode applies an edit-preservation discipline: the LLM applies only the user's requested change, preserves composition, keeps the style prefix prepended. Far cheaper than batch generation because the baseline already exists — refinement is the default iteration mode, not the exception.

Why storyboards come before stills

This is the architecturally important decision. Storyboards are a cheap visual lock on composition. They run as a batch for the whole project — far cheaper than full renders. They render fast and consistently in a style the model handles well. And they serve as vision references into the next stage: full-color photographic renders read the storyboard for camera framing, angle, and composition, then produce a final image with the same composition in a different visual register.

A mistake caught at the storyboard stage avoids a far more expensive mistake at the render stage. The same logic applies in any pipeline where a fast, cheap intermediate artifact can stand in for an expensive final one: wireframes before UI design, blocking before animation, scratch tracks before scoring.

Scene-series continuity

Shots within a scene (1A, 1B, 1C) form a series that must feel continuous. The storyboard stage enforces series continuity by attaching earlier-rendered storyboard frames as references when generating later ones. Two constraints operate at once:

  1. Style continuity — same line weight, hatching, red-accent treatment across the series.
  2. Angle variety — no two consecutive shots in a series share camera distance + angle.

Both constraints are encoded in the system prompt; the reference attachments are what makes the model see them, not just be told them.

What storyboards do not do

  • They do not specify color — final color comes from moodboard references at render time.
  • They do not specify final lighting behavior — only rough directional shading via hatching.
  • They do not lock final subject appearance — character design is locked via reference inheritance at the next stage.

The storyboard's job is exactly three things: composition, camera language, and scene-series continuity. Everything else is handled later.

The pattern beyond storyboards

The general shape is: when a downstream stage is expensive, expose a cheap upstream stage whose outputs can be reviewed, rejected, and re-used as references. The storyboard pipeline is one instance. Wireframes before high-fidelity UI design are another. Blocking poses before animation. Rough mixes before mastering. The discipline is the same — a cheap visual lock that catches expensive mistakes and contributes to the artifact it precedes, rather than getting thrown away.

The brief-to-scenario stage feeds shot lists into storyboard generation; storyboard generation feeds visual references into final rendering. The chain works because each stage is a step harder than the last to redo.