Skip to content
CONCEPTS
3 min readЧитать на русском

Generative UI in Agentic Chat

A user asks your agent "make me a poster." It asks back, "what style?" They reply, "you know, modern, but not too modern, maybe like that one Wong Kar-wai movie." Your agent parses that into a real generation prompt — badly. The user retypes. Iteration is slow because every step is free-text-in and free-text-out, and the agent keeps guessing what the user meant. An idea-card with three concrete previews would settle the same question in one click.

What follows is the pattern: let the agent emit not just text, but typed UI components that render in-stream. The user's next "message" becomes a click on one of those components, not a sentence.

This is generative UI: situational, agent-emitted UI elements interleaved with assistant turns. A chat agent can do more than emit text — it can emit typed components that render as cards, questionnaires, previews, forms. The user's next message becomes a click inside one of those rather than free text, and the agent receives a structured event instead of a paragraph it has to parse.

When to apply

  • The conversation has branching decision points better expressed as buttons / forms than as "type one of: A, B, C."
  • The agent has partial output worth showing now (an idea card, a draft prompt, an early image) so the user can intervene before the next stage.
  • The runtime LLM does not support native tool / function calling, but you still need structured intermediate states.

Mechanism

  1. Component vocabulary. Define a closed set of typed UI elements: IdeaList, Questionnaire, PromptPreview, ArtworkCard, PublishForm, DoneCard. Each has a JSON shape and a renderer.
  2. JSON-emulated function calling. The LLM emits a fenced JSON block (e.g. ```tool\n{"name":"show_ideas","args":{...}}\n```). The server parses these blocks out of the stream and converts them into UI events. Native tool-use API not required.
  3. Streamed SSE. The agent response is a Server-Sent-Events stream of mixed text deltas and tool-call objects. The client appends text to the current bubble until it sees a tool event, then renders the component and starts a fresh bubble.
  4. Click-as-message. Every interactive element on a rendered component posts a structured message back to the agent endpoint (e.g. {"action":"pick_idea","idea_id":"…"}). The server treats this as the next user turn, but with a known schema, so the LLM doesn't have to parse intent from free text.
  5. State carryforward. The agent store (per the four-tier-prompt-source-hierarchy) keeps the artifacts produced by past components (chosen idea, draft prompt, generated image URL) so later tool calls receive them as structured args, not as chat-history scraps.

Why JSON emulation, not native tool-use

  • Works with any LLM, including ones routed via third-party hosts or locally-deployed models where native tool-use APIs aren't exposed.
  • Easier to debug — the raw stream is human-readable.
  • The same fenced format works for streaming partial JSON: text-deltas appear inside the fence, the parser tolerates incomplete objects until the closing fence arrives.

A worked component vocabulary

IdeaList → PromptQuestionnaire → PromptPreview → ArtworkCard → PublishForm → DoneCard. The agent narrates the flow in natural language and emits one of these components at each decision point. User picks an idea → the store's chosenIdea updates → the questionnaire renders → a craft_prompt tool fires with the full four-tier context attached.

The conversation reads like chat, but the data shape underneath is structured.

Pitfalls

  • Scroll-jank on hover tooltips. Anything anchored to a chat bubble must re-measure on scroll. Capture-phase scroll listeners that null out hover state on every wheel tick will self-destruct under auto-scroll. Portal to document.body and track via rAF + getBoundingClientRect.
  • Component-state leakage. When the same component type renders twice in a session (the user goes back and picks a different idea), each instance needs a stable key tied to the agent-turn ID, not the component-type alone.
  • Fallback text. Every tool event needs a text equivalent for screen readers and for stream replay where the UI didn't render.

Pairs with