Generative UI in Agentic Chat
A user asks your agent "make me a poster." It asks back, "what style?" They reply, "you know, modern, but not too modern, maybe like that one Wong Kar-wai movie." Your agent parses that into a real generation prompt — badly. The user retypes. Iteration is slow because every step is free-text-in and free-text-out, and the agent keeps guessing what the user meant. An idea-card with three concrete previews would settle the same question in one click.
What follows is the pattern: let the agent emit not just text, but typed UI components that render in-stream. The user's next "message" becomes a click on one of those components, not a sentence.
This is generative UI: situational, agent-emitted UI elements interleaved with assistant turns. A chat agent can do more than emit text — it can emit typed components that render as cards, questionnaires, previews, forms. The user's next message becomes a click inside one of those rather than free text, and the agent receives a structured event instead of a paragraph it has to parse.
When to apply
- The conversation has branching decision points better expressed as buttons / forms than as "type one of: A, B, C."
- The agent has partial output worth showing now (an idea card, a draft prompt, an early image) so the user can intervene before the next stage.
- The runtime LLM does not support native tool / function calling, but you still need structured intermediate states.
Mechanism
- Component vocabulary. Define a closed set of typed UI elements:
IdeaList,Questionnaire,PromptPreview,ArtworkCard,PublishForm,DoneCard. Each has a JSON shape and a renderer. - JSON-emulated function calling. The LLM emits a fenced JSON block (e.g.
```tool\n{"name":"show_ideas","args":{...}}\n```). The server parses these blocks out of the stream and converts them into UI events. Native tool-use API not required. - Streamed SSE. The agent response is a Server-Sent-Events stream of mixed text deltas and tool-call objects. The client appends text to the current bubble until it sees a tool event, then renders the component and starts a fresh bubble.
- Click-as-message. Every interactive element on a rendered component posts a structured message back to the agent endpoint (e.g.
{"action":"pick_idea","idea_id":"…"}). The server treats this as the next user turn, but with a known schema, so the LLM doesn't have to parse intent from free text. - State carryforward. The agent store (per the four-tier-prompt-source-hierarchy) keeps the artifacts produced by past components (chosen idea, draft prompt, generated image URL) so later tool calls receive them as structured args, not as chat-history scraps.
Why JSON emulation, not native tool-use
- Works with any LLM, including ones routed via third-party hosts or locally-deployed models where native tool-use APIs aren't exposed.
- Easier to debug — the raw stream is human-readable.
- The same fenced format works for streaming partial JSON: text-deltas appear inside the fence, the parser tolerates incomplete objects until the closing fence arrives.
A worked component vocabulary
IdeaList → PromptQuestionnaire → PromptPreview → ArtworkCard → PublishForm → DoneCard. The agent narrates the flow in natural language and emits one of these components at each decision point. User picks an idea → the store's chosenIdea updates → the questionnaire renders → a craft_prompt tool fires with the full four-tier context attached.
The conversation reads like chat, but the data shape underneath is structured.
Pitfalls
- Scroll-jank on hover tooltips. Anything anchored to a chat bubble must re-measure on scroll. Capture-phase scroll listeners that null out hover state on every wheel tick will self-destruct under auto-scroll. Portal to
document.bodyand track via rAF +getBoundingClientRect. - Component-state leakage. When the same component type renders twice in a session (the user goes back and picks a different idea), each instance needs a stable key tied to the agent-turn ID, not the component-type alone.
- Fallback text. Every tool event needs a text equivalent for screen readers and for stream replay where the UI didn't render.
Pairs with
- four-tier-prompt-source-hierarchy — what the agent carries between component-driven turns.
- template-dispatch-prompts — server-side, which prompt template to use for each click-message.
- two-stage-architect-pattern — generative UI is the natural surface for the Creative-Director step (cards / options) before the Technical-Architect step (commit).