May 15, 20263 min readЧитать на русском

Generative UI in Agentic Chat

A user asks your agent "make me a poster." It asks back, "what style?" They reply, "you know, modern, but not too modern, maybe like that one Wong Kar-wai movie." Your agent parses that into a real generation prompt — badly. The user retypes. Iteration is slow because every step is free-text-in and free-text-out, and the agent keeps guessing what the user meant. An idea-card with three concrete previews would settle the same question in one click.

What follows is the pattern: let the agent emit not just text, but typed UI components that render in-stream. The user's next "message" becomes a click on one of those components, not a sentence.

This is generative UI: situational, agent-emitted UI elements interleaved with assistant turns. A chat agent can do more than emit text — it can emit typed components that render as cards, questionnaires, previews, forms. The user's next message becomes a click inside one of those rather than free text, and the agent receives a structured event instead of a paragraph it has to parse.

When to apply

The conversation has branching decision points better expressed as buttons / forms than as "type one of: A, B, C."
The agent has partial output worth showing now (an idea card, a draft prompt, an early image) so the user can intervene before the next stage.
The runtime LLM does not support native tool / function calling, but you still need structured intermediate states.

Mechanism

Component vocabulary. Define a closed set of typed UI elements: IdeaList, Questionnaire, PromptPreview, ArtworkCard, PublishForm, DoneCard. Each has a JSON shape and a renderer.
JSON-emulated function calling. The LLM emits a fenced JSON block (e.g. ```tool\n{"name":"show_ideas","args":{...}}\n```). The server parses these blocks out of the stream and converts them into UI events. Native tool-use API not required.
Streamed SSE. The agent response is a Server-Sent-Events stream of mixed text deltas and tool-call objects. The client appends text to the current bubble until it sees a tool event, then renders the component and starts a fresh bubble.
Click-as-message. Every interactive element on a rendered component posts a structured message back to the agent endpoint (e.g. {"action":"pick_idea","idea_id":"…"}). The server treats this as the next user turn, but with a known schema, so the LLM doesn't have to parse intent from free text.
State carryforward. The agent store (per the four-tier-prompt-source-hierarchy) keeps the artifacts produced by past components (chosen idea, draft prompt, generated image URL) so later tool calls receive them as structured args, not as chat-history scraps.

Why JSON emulation, not native tool-use

Works with any LLM, including ones routed via third-party hosts or locally-deployed models where native tool-use APIs aren't exposed.
Easier to debug — the raw stream is human-readable.
The same fenced format works for streaming partial JSON: text-deltas appear inside the fence, the parser tolerates incomplete objects until the closing fence arrives.

A worked component vocabulary

IdeaList → PromptQuestionnaire → PromptPreview → ArtworkCard → PublishForm → DoneCard. The agent narrates the flow in natural language and emits one of these components at each decision point. User picks an idea → the store's chosenIdea updates → the questionnaire renders → a craft_prompt tool fires with the full four-tier context attached.

The conversation reads like chat, but the data shape underneath is structured.

Pitfalls

Scroll-jank on hover tooltips. Anything anchored to a chat bubble must re-measure on scroll. Capture-phase scroll listeners that null out hover state on every wheel tick will self-destruct under auto-scroll. Portal to document.body and track via rAF + getBoundingClientRect.
Component-state leakage. When the same component type renders twice in a session (the user goes back and picks a different idea), each instance needs a stable key tied to the agent-turn ID, not the component-type alone.
Fallback text. Every tool event needs a text equivalent for screen readers and for stream replay where the UI didn't render.

Pairs with

four-tier-prompt-source-hierarchy — what the agent carries between component-driven turns.
template-dispatch-prompts — server-side, which prompt template to use for each click-message.
two-stage-architect-pattern — generative UI is the natural surface for the Creative-Director step (cards / options) before the Technical-Architect step (commit).