Skip to content
CONCEPTS
3 min readЧитать на русском

Seven-Element Shot Description

A director writes shot descriptions the way they would on set: "medium shot, woman at the table, soft window light." Some descriptions mention framing, some mention lens, none mention angle. Downstream stages — storyboard, stills, video — try to extract camera and lighting info and miss half the time. So you switch the pipeline to strict JSON shot objects with seventeen nullable fields, and the directors revolt: half the fields get left blank or stuffed with "default." Free prose loses dimensions; JSON kills authorship.

The middle path is to keep the prose — but make a small, fixed, mandatory set of dimensions non-negotiable in every description.

The seven-element shot description is a prompt architecture for cut-list authoring where every shot description — LLM-generated or human-written — must contain exactly seven named elements, in prose form. Mandatory coverage, free expression. It bridges the too-rigid of JSON schemas and the too-loose of free prose.

The seven elements

  1. Subject + action — what the camera sees and what it's doing
  2. Location — where the scene sits
  3. Framing — wide / medium / close / detail
  4. Angle — eye-level / high / low / top-down
  5. Movement — static, pan, push-in, orbit, etc.
  6. Lens — wide / normal / portrait / macro equivalent
  7. Lighting — directional / soft / hard / practical / natural state

Every shot description — at every stage of the pipeline — contains all seven.

Why seven, and why prose

Seven is the largest working-memory-friendly set that covers the bases. Fewer collapses dimensions (framing implies lens); more starts to feel like a form with blanks to fill.

Prose, not JSON, because:

  • Downstream stages consume the description in multiple ways — storyboards read framing+angle+movement; stills read subject+location; video reads movement+action. A JSON object would force every consumer to map named fields back to LLM-consumable prose.
  • Prose lets dimensions interact naturally ("wide-angle lens at low angle, pushing in slowly") where JSON would have three disconnected fields.
  • LLMs generate better prose than they generate structured fields — the dimensions are all present, but the sentence form reads naturally.

The structure is seven named elements, not seven JSON keys. The LLM is instructed to include all seven; the output is prose.

Why it's a distinct architecture

Free-prose shot descriptions miss dimensions — a writer forgets to specify lens or lighting. Rigid schemas force writers to fill fields they don't care about (duration: null, color_palette: unspecified). The seven-element pattern is a middle path: mandatory coverage, free expression.

It is also the interface between human-authored and LLM-authored shots. A director who knows the seven elements can hand-write a shot the pipeline consumes cleanly; an LLM that emits a seven-element description produces something downstream stages — and the director — can read and edit.

Where the elements go downstream

| Downstream stage | Elements consumed | |---|---| | Storyboard generation | Framing, angle, movement (composition) | | Stills generation | Subject, location, lens (scene content) | | Video generation | Movement, subject+action (motion) |

Downstream stages do not re-extract from free prose. The LLM is instructed to label dimensions clearly in the sentence ("Framing: medium close. Angle: eye-level. Lens: 50mm equivalent.") so each consumer reads the named dimensions directly.

Failure modes

  • Partial descriptions. The model skips lens or lighting because the user didn't specify. Counter: the system prompt requires all seven; if a dimension isn't given, the model chooses a sensible default and states it explicitly.
  • Dimension bleed. Lighting ends up inside the "subject" element ("woman lit by warm light…"). Counter: anti-examples in the system prompt.
  • Over-compression. The description collapses to one sentence that mentions all seven without giving each room. Counter: enforce one sentence per element, or a minimum word count per element.

Generalises beyond shot lists

The pattern is a fixed, small, mandatory set of named dimensions in prose form, covering the task's decision space without collapsing into a form. It applies anywhere you author lists of structured-but-readable items:

  • Test case authoring — seven mandatory fields per test (input, expected output, assumptions, setup, teardown, edge case covered, author).
  • Task decomposition — seven elements per sub-task in an agentic plan (goal, preconditions, action, expected state, rollback, metric, timeout).
  • Character sheets — seven elements per character per scene (presence, costume, motivation, movement, dialogue mood, expression, off-screen note).

Any cut-list-shaped authoring task benefits from the same trick.