The standard agent interface is text in, text out. Users type messages, agents respond with text. This works for many cases but ignores that some information is better conveyed through structured interfaces - forms for collecting input, charts for visualizing data, cards for presenting options, interactive elements for taking action. Generative UI changes the model: agents can produce interface elements, not just text. The result is interactions that feel like applications, not just conversations.
Beyond Text Responses
Consider what happens when you ask an agent to help you book a meeting. In a text-only interface, the conversation might go:
"I need to schedule a meeting with the product team."
"What date and time works for you?"
"Maybe next Tuesday afternoon."
"I see openings at 2pm, 3pm, and 4pm. Which do you prefer?"
"3pm works."
"How long should the meeting be?"
"An hour."
"What's the meeting title?"
Each exchange is a round trip. Information trickles in one piece at a time. The agent asks, the user answers, the agent asks again. It works but feels inefficient - the kind of interaction that a web form would handle in seconds.
Now consider the same task with generative UI. The agent understands you want to book a meeting and generates a scheduling widget - a calendar showing available slots, a duration selector, a title field, and a submit button. You fill it out in one interaction. The entire booking happens in the time the text conversation would take to establish the date.
This is not about making things prettier. It is about using the right representation for the information being exchanged. Some interactions are naturally conversational. Others are naturally form-based, choice-based, or visual. Generative UI lets agents choose the appropriate representation rather than forcing everything into text.
What Generative UI Means
Generative UI refers to agents producing structured interface elements as part of their responses. Instead of returning only text, the agent returns a specification for an interface component that the chat interface renders appropriately.
The agent might generate a form with specific fields. The chat interface receives this specification and renders an actual form - text inputs, dropdowns, date pickers, whatever the specification calls for. The user interacts with the form. The completed form data goes back to the agent as structured input.
Or the agent might generate a set of options presented as cards. Each card shows information and an action button. The user selects one. The selection goes back to the agent as a clear choice, not ambiguous text.
Or the agent might generate a data visualization - a chart showing trends, a table summarizing results, a diagram illustrating relationships. The visualization conveys information that would be tedious or unclear as text.
The key is that the agent decides what to generate based on what the interaction needs. The system handles rendering whatever the agent produces. This separation - agent generates specification, system renders interface - is what makes generative UI work at scale.
Common Widget Types
Several widget categories cover most generative UI needs.
Input forms collect structured information. A form might include text fields, dropdowns, checkboxes, date pickers, file uploads - whatever inputs the task requires. The agent specifies what fields to show, their types, labels, validation rules, and default values. The user fills out the form; the agent receives structured data rather than parsing free text.
Selection interfaces present options for the user to choose from. Cards showing different products, buttons offering different actions, lists with selectable items. The agent specifies the options and what information to show for each. The user selects; the agent receives a clear choice identifier.
Confirmation dialogs verify before consequential actions. Show what will happen, offer confirm and cancel buttons. The agent specifies what to display and what actions the buttons represent. The user confirms or cancels; the agent proceeds accordingly.
Data displays present information visually. Charts for numerical data, tables for structured records, progress indicators for ongoing operations. The agent specifies data and display type; the system renders appropriately.
Action buttons offer specific operations inline. Rather than asking the user what to do next, present buttons for likely options. The agent specifies available actions and their labels. The user clicks; the agent receives the action choice directly.
These widget types compose to handle complex interactions. A response might include text explanation, a data chart, and action buttons together. The interface becomes a mini-application tailored to the current moment in the conversation.
How Generation Works
The mechanics involve the agent producing structured specifications that the rendering system interprets.
When an agent determines that a widget would serve the interaction better than plain text, it generates a specification in a defined format. This specification describes the widget type, its configuration, and any data it should display. For a form, the specification lists fields, types, and labels. For a chart, it includes data series and visualization type. For buttons, it lists options and their identifiers.
The chat interface receives this specification as part of the agent's response. Rendering logic interprets the specification and produces the appropriate visual interface. The user sees and interacts with actual interface elements, not text descriptions of what elements would look like.
When the user interacts - submitting a form, clicking a button, making a selection - the result goes back to the agent as structured data. The agent receives exactly what the user chose or entered, in a format that requires no parsing or interpretation. This structured data becomes input for the agent's next decision.
The agent does not need to know how rendering works. It produces specifications; the system handles rendering. This abstraction lets agents focus on what interface would be helpful rather than implementation details.
When to Generate UI
Not every agent response benefits from generated UI. Text remains appropriate for many interactions. Knowing when each approach fits improves the overall experience.
Structured data collection clearly benefits from forms. When you need specific pieces of information in specific formats, a form communicates what is needed and validates input before submission. Free text collection of the same information is error-prone and tedious.
Choice among discrete options benefits from explicit selection interfaces. When the user must pick one of several alternatives, showing those alternatives as selectable cards or buttons is clearer than listing them in text and asking the user to type their choice.
Data-heavy responses benefit from visualization. When the answer includes numerical trends, comparisons, or structured records, charts and tables communicate more effectively than text descriptions of numbers.
Consequential confirmations benefit from explicit dialogs. When an action has significant effects, a confirmation dialog with clear confirm/cancel buttons is safer than asking "are you sure?" in text and parsing the response.
Exploratory conversation often works better as text. When the interaction is open-ended, when the user is thinking aloud, when the direction is not yet clear - text conversation provides flexibility that structured interfaces do not.
Simple questions often do not need widgets. If the agent needs a yes or no, asking in text is fine. Generating a two-button widget for trivial questions adds overhead without benefit.
The agent should consider both what information is being exchanged and what representation makes the exchange clearest. This decision is part of the agent's reasoning, informed by the current context and interaction history.
Designing Effective Widgets
Generated widgets should follow interface design principles even though they are generated dynamically.
Clarity means the widget's purpose is immediately obvious. Labels should be descriptive. Options should be distinguishable. The user should understand what to do without explanation.
Economy means including only what is needed. Extra fields, unnecessary options, and redundant information create confusion. Generate the minimal widget that accomplishes the interaction goal.
Defaults speed common cases. When a sensible default exists, pre-fill it. The user can change it if needed but does not have to specify obvious values.
Validation catches errors before submission. Required fields should be marked. Input types should be appropriate (date picker for dates, number input for numbers). Invalid states should be clear.
Consistency across generated widgets helps users learn the patterns. Similar interactions should produce similar widgets. Consistent styling, consistent placement, consistent behavior.
System prompts can guide agents toward effective widget generation by explaining these principles and showing examples of well-designed widgets for common scenarios.
Security Considerations
Generated UI introduces surface area that requires attention.
Rendered content should be sanitized. If agents can generate HTML or similar markup, ensure rendering cannot execute scripts or access resources inappropriately. Sandboxed rendering prevents agents from generating malicious content that the interface would execute.
Action handling should verify authorization. When a user clicks an action button, the resulting action should go through normal permission checks. The widget is a convenience for invoking actions, not a bypass of access control.
Data display should respect visibility rules. If an agent can generate widgets showing data, ensure it only shows data the user is authorized to see. Widget generation should not be a vector for information disclosure.
Input validation should happen on submission, not just client-side. Widget-specified validation helps users enter correct data, but server-side validation ensures correctness even if client validation is bypassed.
These considerations parallel standard web application security. Generative UI is essentially dynamic UI generation, with similar security requirements to any dynamic content system.
For teams building agent experiences that go beyond text, inference.sh supports generative UI through dynamic widgets that agents can include in their responses. Agents generate specifications; the interface renders appropriate components. The interaction feels like an application built for the specific moment rather than a generic chat forced to handle every scenario through text.
Generative UI expands what agent interactions can be. Some tasks are conversations. Others are form fills, data explorations, or action selections. Agents that can produce the right interface for each moment provide better experiences than agents limited to text alone.
FAQ
Do agents need special training to generate UI effectively?
Agents need guidance through system prompts rather than special training. The prompt should explain what widget types are available, when each is appropriate, and how to specify them. Examples of good widget specifications help the agent understand the format and conventions. Most capable models can generate structured widget specifications given clear instructions. The key is prompt engineering that teaches widget generation as a skill, not model fine-tuning.
How do I handle widgets on interfaces that cannot render them?
Graceful degradation ensures usability across interfaces. When a widget-incapable interface receives a response with widget specifications, it should fall back to a text representation. A form becomes a list of questions. A selection becomes numbered options with instructions to reply with a number. A chart becomes a text summary of the data. Design widget specifications to include fallback text, or implement automatic fallback generation. The experience is degraded but functional.
Can users request specific interface types, or do agents decide?
Both approaches work. Agents can decide based on context, generating widgets when they judge it appropriate. Users can also express preferences - "show me a chart" or "just tell me in text" - that the agent respects. Some implementations let users configure default preferences (prefer forms vs. prefer conversational input). The most flexible approach combines agent judgment with user preference, letting users override when they prefer something different from what the agent chose.