April 20, 20267 min readЧитать на русском

The Two-Stage Architect Pattern

Separating Creative Intent from Technical Generation in LLM Pipelines

Author: Alex Nix Status: Working draft — for public release

Abstract

Most LLM-driven creative pipelines use a single prompt to produce generator-ready output — one system prompt, one temperature, one role. For small tasks this is fine. For production tasks — multi-shot campaigns, narrative storyboards, consistent-series generation — the single-prompt approach causes predictable failures: creative monotony, technical malformation, debug opacity. None of these are model-quality problems; they are architecture problems.

This paper describes the Two-Stage Architect Pattern: split the work into a Creative Director LLM (prose concepts, high temperature, explorative) and a Technical Architect LLM (structured prompts, lower temperature, compliant). The pattern is a thousand-foot idea — separate what from how — with specific technical properties that make it reliable in production. The paper names those properties, the failure modes they prevent, and the open questions that remain.

1. Motivation

A single LLM asked to produce N generator-ready prompts for a multi-shot task has four jobs:

Generate N distinct creative concepts
Decide which dimensions unify the set (environment, subject, mood)
Format each concept as a generator-ready prompt (structure, reference numbering, constraints)
Preserve format discipline across all N outputs

Each job wants different LLM behavior:

Job 1 wants creative breadth: high temperature, exploratory prompting.
Job 2 wants coherence: temperature is fine, but the unification must be stated up-front, not rediscovered per shot.
Job 3 wants compliance: low temperature, strict format.
Job 4 wants determinism: lowest temperature, explicit output contract.

A single LLM call can do any one of these well. Doing all four in one call is a compromise — creative concepts come out repetitive, prompts come out malformed, or both.

The Two-Stage Architect Pattern splits the work into two calls that each do one side well.

2. The pattern

Input: descriptors, references, user intent
   │
   ▼
Stage A — Creative Director (LLM)
   Role:       editorial / creative director
   Temp:       0.6–0.8
   Output:     N prose concepts in ONE shared environment
   │
   ▼
Stage B — Technical Architect (LLM)
   Role:       technical prompt engineer (e.g. A.O.C. architect)
   Temp:       0.3–0.5
   Input:      Stage A's prose + structured metadata
               (image positions, descriptor constraints, reference refTypes)
   Output:     N generator-ready prompts with explicit structure

The two stages communicate through prose, not through a shared data structure. The Director's output is narrative; the Architect's input is that narrative plus the structural metadata the Architect needs to emit format.

Two properties make the split worth it

Role clarity. "Be a creative director" and "be a technical prompt architect" pull for opposite LLM behaviors. The first wants exploration; the second wants compliance. A system prompt asking for both reads as "be creative but also be rigid," which is a confused charter. LLMs produce their best work when the role is narrow.

Temperature match. Creative generation wants 0.6–0.8 — enough variance for five concepts to differ meaningfully. Structured emission wants 0.3–0.5 — low enough that format discipline holds across all N outputs. A single LLM call at one temperature compromises both ends.

3. Why the Director is load-bearing for consistency

The Director's specific job is to produce N concepts inside one shared environment. This single-environment constraint is what makes multi-shot consistency structural rather than aspirational.

The Architect stage downstream receives the one environment description and writes all N shots inside it. Each shot varies camera, framing, and subject action; none of them varies environment, lighting state, or time of day. A continuity rule that would be fragile as a per-shot instruction becomes load-bearing because it is established once, upstream, in prose the Architect cannot reinterpret.

If the Director were skipped and the Architect were asked to generate N shots directly from structured inputs, the Architect would re-roll the environment per shot — the LLM's default behavior when asked to write five distinct shots is to differentiate on every dimension available, including the ones that were supposed to be shared.

The Director is the mechanism that removes "differentiate on every dimension" from the Architect's decision space.

4. Why the Architect is load-bearing for format

The Architect's specific job is to emit N prompts in a strict output format — blocks of a specified structure (e.g., Anchor / Optics / Chemistry; see aoc-framework-paper), separated by a low-entropy delimiter, with explicit reference-image numbering, no preamble, no markdown.

A Creative Director asked to also do this emits roughly 70% well-formed output. Blocks drift in length. Delimiters vary. Reference numbers get creative. The LLM reverts to its conversational instinct because that's what "creative director" training pulled for.

The Architect's lower temperature and narrow system prompt produce roughly 98% well-formed output. The remaining 2% is caught by downstream parsing and retried.

Format compliance is what makes an automated pipeline possible. 70% parses into a fragile pipeline; 98% parses into a reliable one.

5. Design properties that make the pattern reliable

Separator discipline

The Architect's output is N blocks separated by a low-entropy delimiter — a character or sequence that cannot plausibly appear inside a block. A literal ^ on its own line is one serviceable choice. ---, ###, or JSON-array formatting are alternatives. The discipline: the delimiter must be a character the LLM would not produce inside a block, so parsing is trivial.

Explicit image/reference positioning

When references are attached, the Architect's system prompt includes an IMAGE ORDER (or equivalent) section listing the references by position. The output prompt must reference them by those positions. The pattern avoids having the Architect invent reference labels that downstream stages have to re-map.

Declared fallbacks

When a reference expected by the Architect is missing (no background reference provided, no style reference provided), the Architect must declare the fallback inline in every prompt it emits. Silent substitution (the LLM fills in a plausible environment without saying so) breaks downstream debugging. Declared fallbacks ("no background reference provided; environment drawn from the Creative Direction") preserve trace.

Temperature sweet spot at 0.5

Observationally, 0.5 is the right temperature for the Architect stage. Higher produces undisciplined output; lower produces N near-identical prompts. The Creative Director stage upstream runs at 0.6–0.8 to provide the variety that the Architect then formalises.

6. Failure modes the pattern prevents

| Failure | Single-stage symptom | Two-stage resolution | |---|---|---| | Creative monotony | All N shots describe slight variations of the same scene | Director produces varied concepts upstream; Architect only formalises | | Format drift | N prompts follow N different structures | Architect's narrow role + low temperature + explicit contract | | Environment divergence | N shots live in N slightly different locations | Director produces one environment; Architect inherits it verbatim | | Reference hallucination | Prompts reference Image 4 when only 3 exist | Architect's IMAGE ORDER section + compliance temperature | | Debug opacity | Bad output has no clear layer of origin | Failures isolate to Director (creative) or Architect (format) | | Long system prompt | One system prompt must cover all concerns | Each stage's system prompt is short and focused |

7. When not to use the pattern

Single-shot generation. The pattern's benefit is spreading one creative intent across many technical outputs. One output doesn't need two calls.
Tasks with tightly-coupled creative-and-technical decisions. Some generation tasks don't cleanly separate "what" from "how" — terse compact prompts, short-form creative writing, some code-generation cases. The pattern adds overhead without benefit.
Early prototyping. Splitting the pipeline into two stages adds operational complexity (two LLM calls, two system prompts to maintain, a larger attack surface for failures). For early exploration, a single prompt is fine.

8. Generalisations beyond image prompts

The pattern — Director emits intent, Architect emits structure — applies to any multi-output LLM pipeline where creativity and compliance are in tension:

Test-case generation. Director: "emit 20 diverse test scenarios across this input surface." Architect: "reformat each into the test-framework's structured representation."
Variant synthesis for UI. Director: "generate 8 distinct layout concepts that satisfy these constraints." Architect: "emit each as valid component markup."
Documentation generation. Director: "outline 5 narrative angles on this feature." Architect: "expand each into the docs-site's markdown schema."
Code-refactor planning. Director: "describe 3 refactor approaches in prose." Architect: "emit each as a structured diff plan."
Agentic action-planning. Director: "consider 3 possible next-action strategies." Architect: "emit each as a concrete tool-call sequence."

The common shape: N varied outputs, consistent in some dimensions, disciplined in format. Whenever the work splits cleanly along that axis, splitting the LLM call pays off.

9. Open research questions

Three-stage variants. Is there value in a middle stage (Producer? Editor?) between Director and Architect, e.g., reconciling the Director's N concepts against a consistency budget before they reach the Architect? Production implementations typically do this inside the Architect; a formal middle stage might isolate the concern.
Architect-only retries. When the Architect emits malformed output, retrying the Architect alone (not re-invoking the Director) is the obvious recovery. But does that recovery introduce creative bias over time? Untested.
Director-temperature calibration. 0.6–0.8 is observational. A controlled study of concept variety × rater quality across temperatures for the Director stage specifically would be useful.
Single-LLM emulation. Can a single LLM at a single temperature emulate the two-stage pattern given sufficiently structured chain-of-thought? Informally: yes, poorly. Formally: worth measuring.
Cross-task generalisation. The pattern works well for image prompts and well-structured generation tasks. Does it generalise to highly creative tasks (fiction, poetry) or is "creative / technical" too coarse a split there?

10. Conclusion

The Two-Stage Architect Pattern is a small architectural move that buys a lot of reliability. Splitting creative intent and technical emission into two LLM calls — with matched temperatures, matched roles, and a prose handoff between them — removes four categorical failure modes (creative monotony, format drift, environment divergence, debug opacity) that a single-stage pipeline persistently produces.

It is not a silver bullet. It adds operational complexity and is wasteful for single-output tasks. But for production pipelines that must produce many outputs with shared creative intent and rigid format compliance, the pattern is the smallest commitment that reliably works.

Citation

Nix, A. (2026). The Two-Stage Architect Pattern — Separating Creative Intent from Technical Generation in LLM Pipelines. Working paper.