Applied AI, UX Lead at Google
GenUX/UI · LLM Evals · Autoraters
When an AI answer IS a UI, not text about a UI. Component-vs-text decision tree, GenUX layout primitives, validation gates for model-composed components, and graceful fallback to text when composition fails.
$ npx skills add darrenhead/skills --skill genux-patternsGenerative UX (GenUX) is the pattern where the answer is a UI — not text about a UI. Instead of streaming prose that describes your order history, the model returns a <OrdersTable> populated with rows. Instead of explaining a chart in words, it returns the chart. The public landscape has converged on this idea from three angles: Vercel AI SDK's streamUI (model picks a React component to render, validated by zod), Anthropic's Artifacts (Claude opens a dedicated surface for substantial output), ChatGPT Canvas (side-by-side editable artefact), and OpenAI's Apps SDK (MCP servers return widgets that ChatGPT renders inline). Different stacks, same idea: for some questions, the right answer shape isn't a paragraph.
This skill is the decision rules for when an answer should be a component versus text, the small primitives set that covers most cases, the disclosure ladder, and — most importantly — the validation gate and the fallback path. Model-composed UI fails in ways model-composed text doesn't. You need both belt and braces.
streamUI vs streamText for a given toolRun the candidate answer through these gates in order. Stop at the first "yes" — that's your answer shape.
Gate 1: Is the answer a list of comparable items? If the user is going to scan, compare, or sort across multiple instances of the same shape (orders, files, bookings, search hits), it's a list or table, not prose. Humans read tables in ~3x less time than paragraphs for tabular data. Composing a table is worth the overhead.
Gate 2: Does the answer have an immediate action affordance? If the right next step is "click to open / approve / reschedule / pay", the answer needs a button, not a sentence saying "you can do X". Action affordances belong in the UI, not the prose. This is the single highest-leverage GenUX promotion — prose with "click here" links is worse than a component with a real button.
Gate 3: Does the answer have a single canonical visual representation? Time series → line chart. Geographic data → map. Categorical comparison → bar chart. Code → syntax-highlighted block with copy button. If the visual representation is obvious and standard, render it. Don't make the user picture it from prose.
Gate 4: Is the answer two things being compared? "X vs Y" answers belong in side-by-side comparators, not in a paragraph that alternates "X does this, but Y does that." The visual structure carries the meaning.
Gate 5: Will the user re-read or come back to this answer? Reference content (a generated summary, a checklist, a config block) gets re-read. Components with stable structure are easier to re-scan than prose. Artifacts and Canvas exist for exactly this reason.
Gate 6: Does the answer need to be edited or refined in place? If the user's next move is "tweak this" rather than "ask a follow-up", you want an editable surface (Canvas, Artifact, inline form), not a chat bubble they have to copy out of.
Default: text. If none of the gates fire, ship prose. Streaming text is faster, cheaper, more forgiving, and doesn't require a validation layer. The bar for promoting an answer to a component is "structure is load-bearing" — not "this would look cool as a card."
You want a small, opinionated set of primitives the model can compose into. Letting the model invent bespoke layouts per answer creates inconsistency, breaks design system constraints, and makes validation impossible. Eight primitives cover ~80% of real cases:
Rules for the primitive set:
GenUX components are easy to over-stuff. The user asked one question; resist composing an entire dashboard. Progressive disclosure rules:
<details> element or a "More" button is the right surface.The mental model: the first paint is the answer; subsequent interaction is the exploration of the answer.
This is the section to read twice. Model-composed UI fails in ways model-composed text doesn't, and the failures are silent.
Always validate model output against a schema before rendering. Zod is the canonical choice in the Vercel AI SDK; pydantic or equivalent in other stacks. The tool call's output schema is the contract; if the model returns something that doesn't match, the component never mounts.
const orderListSchema = z.object({
orders: z
.array(
z.object({
id: z.string(),
date: z.string().datetime(),
total_cents: z.number().int().nonnegative(),
status: z.enum(["pending", "shipped", "delivered", "cancelled"]),
})
)
.max(20), // hard cap at the schema level
});
Sanitize anything that becomes HTML. If the model returns Markdown that you render, run it through a sanitizer (DOMPurify, rehype-sanitize). If the model returns strings that get interpolated into href / src / onClick — stop. Treat model output as untrusted input, because it is. Prompt-injected content can ride the model's response into your DOM.
Bound everything that's a count. Rows, columns, list items, chart points, button group size. The model will occasionally return 500 rows when you wanted 10. The schema enforces 10.
Type-narrow before render. Don't as cast. Parse with the schema, branch on the parse result, render only when valid.
Log validation failures. A silent "the component didn't render" is the worst possible failure mode. You need to see when the model is returning malformed output so you can fix the prompt or the schema.
Test the schema with adversarial inputs. Empty arrays, null fields, wrong enum values, oversized payloads. The validator is your firewall; pen-test it.
When the validation gate rejects model output, do not show a broken component. Do not show "Sorry, something went wrong." Show the same answer as text.
The flow:
The user should never know the component path failed. They get an answer either way. The component is the upgrade; the text is the floor.
This is also the answer to "what if the model can't decide between two primitives?" — it doesn't have to. If the structured tool call doesn't fire cleanly, you fall back to text. The model picks UI when UI is obviously right; everything else stays prose.
Concretely: every GenUX tool ships with a paired text-generation prompt that answers the same question without the component. The fallback is not "show error UI" — it's "stream the prose version."
streamUI (Vercel) commits to a component shape the moment the model picks a tool. The component skeleton renders immediately; data fills in as the generator yields. This is great UX when the model picks correctly and slow / janky when it doesn't.
Tradeoffs:
Practical defaults:
Scenario: a chat surface for a logistics company. A user asks: "What's the status of my orders from last week?"
Step 1 — Run the decision tree.
Decision: table. Multiple orders, scannable columns (id, date, status, total). Lists are for when items are richer than a row; this is rows.
Step 2 — Pick the primitive and write the schema.
const orderTableTool = createTool({
description: "Show recent orders as a sortable table. Use when the user asks about order history, status, or recent purchases.",
inputSchema: z.object({
range_days: z.number().int().min(1).max(90),
}),
outputSchema: z.object({
orders: z
.array(
z.object({
id: z.string().regex(/^ORD-\d{6,}$/),
placed_at: z.string().datetime(),
status: z.enum(["pending", "shipped", "delivered", "cancelled"]),
total_cents: z.number().int().nonnegative(),
})
)
.max(20),
truncated: z.boolean(),
}),
execute: async ({ range_days }) => {
const rows = await db.orders.recent(userId, range_days);
return {
orders: rows.slice(0, 20),
truncated: rows.length > 20,
};
},
});
Step 3 — Render with disclosure.
function OrderTable({ data }: { data: z.infer<typeof orderTableTool.outputSchema> }) {
return (
<div className="rounded-lg border">
<table>
<thead>
<tr><th>Order</th><th>Placed</th><th>Status</th><th className="text-right">Total</th></tr>
</thead>
<tbody>
{data.orders.map((o) => (
<tr key={o.id}>
<td><a href={`/orders/${o.id}`}>{o.id}</a></td>
<td>{formatDate(o.placed_at)}</td>
<td><StatusBadge status={o.status} /></td>
<td className="text-right">{formatCents(o.total_cents)}</td>
</tr>
))}
</tbody>
</table>
{data.truncated && (
<a href="/orders" className="block p-3 text-sm">Show all orders →</a>
)}
</div>
);
}
Notice:
truncated: true triggers the "show all" link.Step 4 — Validation gate.
The schema parses the model's tool output. If status comes back as "in_transit" (not in the enum), parse fails, fallback fires.
Step 5 — What happens on malformed response.
const result = orderTableTool.outputSchema.safeParse(modelOutput);
if (!result.success) {
logValidationFailure({ tool: "orderTable", error: result.error });
// Re-prompt for a prose answer to the same question
return streamText({
model,
system: "Answer the user's question about their orders in 2-3 sentences. Mention order IDs and status. No markdown tables.",
prompt: userMessage,
});
}
return <OrderTable data={result.data} />;
The user sees either a clean table or a clean paragraph. They never see a half-broken component, an error toast, or "something went wrong." The fallback is the text answer to the same question — which is still useful.
What you instrumented: rejection rate per tool, time-to-first-paint for the component path, fallback rate. When the rejection rate spikes after a prompt change, you know which prompt change broke the contract before users notice.
This is the v0.1. The patterns above are the public surface — the decision tree, the primitive set, the validation gate, the fallback path — synthesised from the publicly-documented landscape (Vercel AI SDK, Anthropic Artifacts, ChatGPT Canvas, OpenAI Apps SDK) and from GenUX principles that hold across stacks.
The production examples from the capybara project will deepen the layout primitives, streaming considerations, and worked example sections in v0.2 — with real schemas, real telemetry numbers (rejection rates, p50 stream timing, fallback rate), and the actual primitive inventory that shipped. The <!-- TODO(capybara): ... --> markers in the source mark the seven places where production content will land.
Until then, treat v0.1 as a decision framework, not a cookbook.
useChat integration, tool part renderingstreamUI, the model-as-router mental model, generator-yielded intermediate states_meta.ui.resourceUri, setWidgetState, callTool