Seven-Element Shot Description
A director writes shot descriptions the way they would on set: "medium shot, woman at the table, soft window light." Some descriptions mention framing, some mention lens, none mention angle. Downstream stages — storyboard, stills, video — try to extract camera and lighting info and miss half the time. So you switch the pipeline to strict JSON shot objects with seventeen nullable fields, and the directors revolt: half the fields get left blank or stuffed with "default." Free prose loses dimensions; JSON kills authorship.
The middle path is to keep the prose — but make a small, fixed, mandatory set of dimensions non-negotiable in every description.
The seven-element shot description is a prompt architecture for cut-list authoring where every shot description — LLM-generated or human-written — must contain exactly seven named elements, in prose form. Mandatory coverage, free expression. It bridges the too-rigid of JSON schemas and the too-loose of free prose.
The seven elements
- Subject + action — what the camera sees and what it's doing
- Location — where the scene sits
- Framing — wide / medium / close / detail
- Angle — eye-level / high / low / top-down
- Movement — static, pan, push-in, orbit, etc.
- Lens — wide / normal / portrait / macro equivalent
- Lighting — directional / soft / hard / practical / natural state
Every shot description — at every stage of the pipeline — contains all seven.
Why seven, and why prose
Seven is the largest working-memory-friendly set that covers the bases. Fewer collapses dimensions (framing implies lens); more starts to feel like a form with blanks to fill.
Prose, not JSON, because:
- Downstream stages consume the description in multiple ways — storyboards read framing+angle+movement; stills read subject+location; video reads movement+action. A JSON object would force every consumer to map named fields back to LLM-consumable prose.
- Prose lets dimensions interact naturally ("wide-angle lens at low angle, pushing in slowly") where JSON would have three disconnected fields.
- LLMs generate better prose than they generate structured fields — the dimensions are all present, but the sentence form reads naturally.
The structure is seven named elements, not seven JSON keys. The LLM is instructed to include all seven; the output is prose.
Why it's a distinct architecture
Free-prose shot descriptions miss dimensions — a writer forgets to specify lens or lighting. Rigid schemas force writers to fill fields they don't care about (duration: null, color_palette: unspecified). The seven-element pattern is a middle path: mandatory coverage, free expression.
It is also the interface between human-authored and LLM-authored shots. A director who knows the seven elements can hand-write a shot the pipeline consumes cleanly; an LLM that emits a seven-element description produces something downstream stages — and the director — can read and edit.
Where the elements go downstream
| Downstream stage | Elements consumed | |---|---| | Storyboard generation | Framing, angle, movement (composition) | | Stills generation | Subject, location, lens (scene content) | | Video generation | Movement, subject+action (motion) |
Downstream stages do not re-extract from free prose. The LLM is instructed to label dimensions clearly in the sentence ("Framing: medium close. Angle: eye-level. Lens: 50mm equivalent.") so each consumer reads the named dimensions directly.
Failure modes
- Partial descriptions. The model skips lens or lighting because the user didn't specify. Counter: the system prompt requires all seven; if a dimension isn't given, the model chooses a sensible default and states it explicitly.
- Dimension bleed. Lighting ends up inside the "subject" element ("woman lit by warm light…"). Counter: anti-examples in the system prompt.
- Over-compression. The description collapses to one sentence that mentions all seven without giving each room. Counter: enforce one sentence per element, or a minimum word count per element.
Generalises beyond shot lists
The pattern is a fixed, small, mandatory set of named dimensions in prose form, covering the task's decision space without collapsing into a form. It applies anywhere you author lists of structured-but-readable items:
- Test case authoring — seven mandatory fields per test (input, expected output, assumptions, setup, teardown, edge case covered, author).
- Task decomposition — seven elements per sub-task in an agentic plan (goal, preconditions, action, expected state, rollback, metric, timeout).
- Character sheets — seven elements per character per scene (presence, costume, motivation, movement, dialogue mood, expression, off-screen note).
Any cut-list-shaped authoring task benefits from the same trick.