Let’s be honest, most of our interactions with AI still feel a bit… static. We type in a box, and text comes back. It's cool, but it’s basically a conversation. What if AI could do more than just talk? What if it could build things for us, right on the screen, exactly when we need them?
Imagine asking an AI to "show me the project status," and instead of a block of text, it instantly builds a clean dashboard with progress bars, team member avatars, and buttons for key actions. Or you ask to book a flight, and it generates a complete form, pre-filled with your details.
This isn't science fiction. This is the world of "Agentic UI" and "Generative UI," and it's where things are heading, fast. It’s the shift from AI as a conversationalist to AI as a dynamic, interface-building partner. Today, we're going to pull back the curtain and build the entire thing from the ground up. No black boxes, no magic frameworks. Just a clear, step-by-step look at how this all works.
The Secret Language: How an AI Agent and a UI Actually Talk
Before an AI can build a UI, it needs a way to communicate with the front end. You can't just have the AI randomly shouting commands. You need a structured, real-time communication channel.
This is where the Agentic UI (AG-UI) protocol comes in. Think of it as a live, play-by-play commentary of everything the agent is thinking and doing. It’s not just one big data dump at the end; it's a constant stream of small, specific events.
We're talking about events like:
RUN_STARTED: "Okay, I've heard the user's request and I'm starting to work on it."TEXT_MESSAGE_CONTENT: "Here's the next word of the sentence I'm forming..." (This is how you get that cool, token-by-token streaming effect).TOOL_CALL_START: "I need to use one of my special tools, like querying a database."TOOL_CALL_RESULT: "Okay, the database gave me back this data."RUN_FINISHED: "Phew, all done!"
By breaking down the agent's process into these tiny, discrete events, the UI can react instantly. It can show a loading spinner, stream in text as it's generated, or display a "thinking" status. The user is never left staring at a blank screen, wondering if the AI is still alive. This event stream is the foundational nervous system for any truly interactive agentic experience.
LEGO Instructions for UIs: Building with A2UI
So, the agent can now "talk" to the UI. But what does it say when it wants to build something?
You might think it would send over a chunk of code, like HTML or JavaScript. But that's a massive security nightmare. You can't just let an LLM generate and run executable code on the fly.
Instead, we use a clever system called A2UI (a spec from Google). Think of it like this: the AI agent sends a set of LEGO instructions, not the finished LEGO model. These instructions are just simple, structured JSON data. They say things like:
- "I need a
cardwith the ID 'root'." - "Inside 'root', I need a
text-fieldwith the ID 'c1' and the label 'Name'." - "Then, I need a
buttonwith the ID 'c2' and the text 'Submit'."
The instructions are just a flat list of components and their relationships. It’s safe, simple, and easy for an LLM to generate.
The real magic happens on the client side (the app or website). It has what’s called a Widget Registry. This registry is like your box of actual LEGO bricks. When it receives the instruction "create a button," it knows how to render its own, native, pre-approved button component. The AI provides the blueprint; the UI does the safe, trusted building. This decouples the agent's logic from the UI's presentation, which is a brilliant and secure way to build.
The "Wow" Moment: When the LLM Becomes a UI Designer
Okay, we have the communication channel (AG-UI) and the LEGO instructions (A2UI). Now for the really cool part: teaching the LLM to write those instructions.
This is what "Generative UI" is all about. We give the LLM a special prompt that essentially teaches it to be a UI designer. We tell it about all the available components (cards, buttons, progress bars, etc.) and explain the rules for how to structure the A2UI JSON.
Then, we give it a user query.
- User: "I need to create a new user account."
- LLM: (Thinks for a second) "Aha! That sounds like a form. I'll generate a UI with a card, some text fields for name and email, and a 'Create Account' button."
It then outputs the perfect A2UI JSON to represent that form.
- User: "Show me the status of my Q3 sales dashboard."
- LLM: 'Okay, that's a dashboard. I'll use a progress bar for the completion percentage, some 'chip' components for tags, and a data table for the sales numbers."
The same agent can generate wildly different UIs because it's not relying on pre-built templates. It's reasoning about the user's intent and choosing the best UI pattern for the job, in real time. This is a huge leap beyond just spitting out text.
Keeping Everyone on the Same Page: State Synchronization
In a dynamic system like this, the agent and the UI both need to know the current "state of the world." What stage is the process in? What data is being displayed?
You could just send the entire state back and forth on every little change, but that's incredibly inefficient. Instead, we use a system that feels a lot like a shared Google Doc.
- The Snapshot: When the session starts, the agent sends a
STATE_SNAPSHOTevent. This is the full, initial document, letting the UI know the complete starting state. - The Deltas: From then on, the agent only sends tiny
STATE_DELTAevents. These deltas use a format called JSON Patch, which is a super-efficient way to describe changes. It sends instructions like, "change the value at/pipeline/progressto0.75" or "add this new item to the list at/feedback/2."
The UI receives these tiny patches and applies them, ensuring its version of the state is always perfectly in sync with the agent's, without wasting bandwidth. It’s fast, efficient, and essential for making the UI feel responsive.
Adult Supervision: The "Are You Sure?" Button for AI
What happens when an agent wants to do something... risky? Like, "delete the entire user database" or "send an email to all 50,000 customers." You probably don't want the AI doing that on its own.
This is where the Human-in-the-Loop (HITL) pattern comes in, powered by a special INTERRUPT event.
We can define rules for the agent, telling it which actions are high-risk. When it's about to perform one of these actions, it doesn't just do it. Instead, it fires an INTERRUPT event and pauses its execution.
This event tells the UI, "Hey, I'm about to do something big. Please show the user a confirmation dialog with these options: Approve, Reject, or Modify."
The whole system waits. The user can then look at the proposed action and make a decision. Only after the human gives the green light does the agent resume its work. It’s a critical safety mechanism that keeps the human in control of high-stakes decisions.
Putting It All Together: The Full Pipeline in Action
Now, let's see the whole symphony come together. A user types a query, and this is what happens in a matter of seconds:
- Routing: An initial LLM call analyzes the query to understand the user's intent. Is this a request for a form, a dashboard, or a simple confirmation?
- UI Generation: Based on that intent, the agent generates the appropriate A2UI component tree. All of this is communicated via the AG-UI event stream, so the user sees messages like "Okay, building a dashboard for you..."
- Human Approval (if needed): If the generated UI is for a critical action, an
INTERRUPTis fired, asking the user to confirm before rendering. - Rendering: The A2UI JSON is sent to the client, where the Widget Registry translates it into the final, interactive UI on the user's screen.
- State Sync: As the user interacts with the UI or the agent continues its work,
STATE_DELTAevents keep everything perfectly synchronized.
This entire pipeline, from a simple text query to a fully rendered, interactive interface, is the core of a modern agentic system.
The Final Touch: Editing a Live UI
What if the UI is already on the screen, and the agent needs to change something? For example, a task on a sprint board moves from "In Progress" to "Done."
You don't want to regenerate the entire UI for such a small change. Because A2UI is so structured, the agent can send incremental update messages. It can send a command to:
- Update a component: "Find the component with ID 'task-123' and change its properties."
- Add a component: "Create this new 'chip' component and add it as a child of the 'board' component."
- Remove a component: "Delete the component with ID 'task-456'."
This allows for incredibly dynamic and collaborative experiences where the UI can evolve and change in real time based on the agent's work, without any jarring reloads.
So, What Have We Built?
When you step back, you realize we've created something far more powerful than a chatbot. We've designed a system where an AI can understand a request, reason about the best way to present information and actions, and then build a custom, secure, and interactive interface to match.
This stack—with AG-UI for real-time communication, A2UI for safe, declarative UIs, and patterns for state management and human oversight—is a blueprint for the next generation of human-computer interaction. It’s a future where the software we use adapts to us, not the other way around. And now, you know exactly how it’s built.




