Darren HeadPortfolio
Darren HeadPortfolio
Google

Applied AI, UX Lead at Google

GenUX/UI · LLM Evals · Autoraters

All skills

Generative UX patterns

GenUX

When an AI answer IS a UI, not text about a UI. Component-vs-text decision tree, GenUX layout primitives, validation gates for model-composed components, and graceful fallback to text when composition fails.

Install

$ npx skills add darrenhead/skills --skill genux-patterns

When to use this skill

  • You're designing a chat or agent feature and the answer is structured data (orders, events, files, search results, comparison)
  • You're using Vercel AI SDK and weighing `streamUI` vs `streamText` for a given tool
  • You're shipping an MCP server with the OpenAI Apps SDK or rendering Anthropic Artifacts and need to decide what to compose
  • You're building an internal LLM tool and tempted to render the raw model output as HTML (don't — read the validation section)
  • You've shipped a GenUX surface and components are silently rendering broken or missing data

SKILL.md

View on GitHub

Generative UX patterns

Generative UX (GenUX) is the pattern where the answer is a UI — not text about a UI. Instead of streaming prose that describes your order history, the model returns a <OrdersTable> populated with rows. Instead of explaining a chart in words, it returns the chart. The public landscape has converged on this idea from three angles: Vercel AI SDK's streamUI (model picks a React component to render, validated by zod), Anthropic's Artifacts (Claude opens a dedicated surface for substantial output), ChatGPT Canvas (side-by-side editable artefact), and OpenAI's Apps SDK (MCP servers return widgets that ChatGPT renders inline). Different stacks, same idea: for some questions, the right answer shape isn't a paragraph.

This skill is the decision rules for when an answer should be a component versus text, the small primitives set that covers most cases, the disclosure ladder, and — most importantly — the validation gate and the fallback path. Model-composed UI fails in ways model-composed text doesn't. You need both belt and braces.

When to use this skill

  • You're designing a chat or agent feature and the answer is structured data (orders, events, files, search results, comparison)
  • You're using Vercel AI SDK and weighing streamUI vs streamText for a given tool
  • You're shipping an MCP server with the OpenAI Apps SDK or rendering Anthropic Artifacts and need to decide what to compose
  • You're building an internal LLM tool and tempted to render the raw model output as HTML (don't — read the validation section)
  • You've shipped a GenUX surface and components are silently rendering broken or missing data
  • The question "should this answer be UI or text?" is open and you want a decision tree, not vibes

When NOT to use this skill

  • Most answers should still be text. A one-paragraph response doesn't need a component shell around it. GenUX is for answers where structure is load-bearing.
  • The user asked a conversational question ("what do you think about X?") — that's prose. Don't wrap an opinion in a card.
  • You're early in prototyping and don't yet know what shapes the model returns — get text working first, then promote the recurring shapes to components.
  • You don't have a validation layer in front of model output. Without zod (or equivalent) plus a fallback path, GenUX will ship broken UI to users. Build the gate before the surface.
  • Latency is critical (sub-second target) and your component requires a second model call to compose — the round trip kills the experience.

The component-vs-text decision tree

Run the candidate answer through these gates in order. Stop at the first "yes" — that's your answer shape.

Gate 1: Is the answer a list of comparable items? If the user is going to scan, compare, or sort across multiple instances of the same shape (orders, files, bookings, search hits), it's a list or table, not prose. Humans read tables in ~3x less time than paragraphs for tabular data. Composing a table is worth the overhead.

Gate 2: Does the answer have an immediate action affordance? If the right next step is "click to open / approve / reschedule / pay", the answer needs a button, not a sentence saying "you can do X". Action affordances belong in the UI, not the prose. This is the single highest-leverage GenUX promotion — prose with "click here" links is worse than a component with a real button.

Gate 3: Does the answer have a single canonical visual representation? Time series → line chart. Geographic data → map. Categorical comparison → bar chart. Code → syntax-highlighted block with copy button. If the visual representation is obvious and standard, render it. Don't make the user picture it from prose.

Gate 4: Is the answer two things being compared? "X vs Y" answers belong in side-by-side comparators, not in a paragraph that alternates "X does this, but Y does that." The visual structure carries the meaning.

Gate 5: Will the user re-read or come back to this answer? Reference content (a generated summary, a checklist, a config block) gets re-read. Components with stable structure are easier to re-scan than prose. Artifacts and Canvas exist for exactly this reason.

Gate 6: Does the answer need to be edited or refined in place? If the user's next move is "tweak this" rather than "ask a follow-up", you want an editable surface (Canvas, Artifact, inline form), not a chat bubble they have to copy out of.

Default: text. If none of the gates fire, ship prose. Streaming text is faster, cheaper, more forgiving, and doesn't require a validation layer. The bar for promoting an answer to a component is "structure is load-bearing" — not "this would look cool as a card."

GenUX layout primitives

You want a small, opinionated set of primitives the model can compose into. Letting the model invent bespoke layouts per answer creates inconsistency, breaks design system constraints, and makes validation impossible. Eight primitives cover ~80% of real cases:

  1. Card — one thing with a headline, a few key fields, optional action. The default container for a single result.
  2. List — N cards stacked vertically. For 3–20 items where each item is more than a row of data.
  3. Table — N rows × M columns. For >5 items where the columns are the same shape and the user will scan vertically. Bound the row count.
  4. Form — labelled inputs + submit. For when the next move is structured user input (booking, filter, edit). Validate the schema both ways.
  5. Chart — time series, bar, pie. One chart type per answer. Don't compose dashboards from a single turn.
  6. Code block — language-tagged, syntax-highlighted, copyable. For code, config, commands, structured data the user will paste elsewhere.
  7. Side-by-side comparator — two cards or two tables aligned. For X-vs-Y answers. Same field order on both sides.
  8. Action button group — 1–3 buttons with clear verbs. For when the model's answer is "here are your options." Not a substitute for the answer; an affordance on top of it.

Rules for the primitive set:

  • Closed set, not open. Add a ninth primitive only when an existing one provably can't carry the case. Drift creates inconsistency.
  • Composable, not nestable. A list of cards is fine. A card inside a form inside a comparator inside a card is a bug.
  • Every primitive ships with a text fallback. If the component can't render, the same content streams as prose. (See: graceful fallback.)
  • The model picks the primitive, not the layout. It chooses "table" — you control column widths, padding, dark mode, accessibility. The model never writes CSS.

Disclosure rules

GenUX components are easy to over-stuff. The user asked one question; resist composing an entire dashboard. Progressive disclosure rules:

  • Headline first. The component renders the answer at a glance. A single card with the key fact. A table with the top 5 rows. A chart with the headline metric. The user shouldn't scroll to find the answer.
  • Expand on demand. Detail panels, "show all rows", drill-downs — these are user-initiated, not default-open. A <details> element or a "More" button is the right surface.
  • One primary action. If you ship a button group, exactly one button is the visually primary action. Three equally-weighted buttons is a decision tree pretending to be UI.
  • Cap list and table sizes. Hard limit at the primitive level: tables ≤ 20 rows by default, lists ≤ 10 items. Beyond that, truncate and offer "show all." Unbounded model output → unbounded render cost → blown layouts.
  • Sticky context, scrollable detail. If the component is tall, the title and primary action stay visible while body content scrolls. Standard mobile pattern; works for GenUX too.

The mental model: the first paint is the answer; subsequent interaction is the exploration of the answer.

Validation gates for model-composed UI

This is the section to read twice. Model-composed UI fails in ways model-composed text doesn't, and the failures are silent.

Always validate model output against a schema before rendering. Zod is the canonical choice in the Vercel AI SDK; pydantic or equivalent in other stacks. The tool call's output schema is the contract; if the model returns something that doesn't match, the component never mounts.

const orderListSchema = z.object({
  orders: z
    .array(
      z.object({
        id: z.string(),
        date: z.string().datetime(),
        total_cents: z.number().int().nonnegative(),
        status: z.enum(["pending", "shipped", "delivered", "cancelled"]),
      })
    )
    .max(20), // hard cap at the schema level
});

Sanitize anything that becomes HTML. If the model returns Markdown that you render, run it through a sanitizer (DOMPurify, rehype-sanitize). If the model returns strings that get interpolated into href / src / onClick — stop. Treat model output as untrusted input, because it is. Prompt-injected content can ride the model's response into your DOM.

Bound everything that's a count. Rows, columns, list items, chart points, button group size. The model will occasionally return 500 rows when you wanted 10. The schema enforces 10.

Type-narrow before render. Don't as cast. Parse with the schema, branch on the parse result, render only when valid.

Log validation failures. A silent "the component didn't render" is the worst possible failure mode. You need to see when the model is returning malformed output so you can fix the prompt or the schema.

Test the schema with adversarial inputs. Empty arrays, null fields, wrong enum values, oversized payloads. The validator is your firewall; pen-test it.

Graceful fallback

When the validation gate rejects model output, do not show a broken component. Do not show "Sorry, something went wrong." Show the same answer as text.

The flow:

  1. Model returns tool call with structured payload
  2. Schema validates payload
  3. On success: render the component
  4. On failure: re-prompt the model (one retry) to answer in prose, stream the text response, log the failure

The user should never know the component path failed. They get an answer either way. The component is the upgrade; the text is the floor.

This is also the answer to "what if the model can't decide between two primitives?" — it doesn't have to. If the structured tool call doesn't fire cleanly, you fall back to text. The model picks UI when UI is obviously right; everything else stays prose.

Concretely: every GenUX tool ships with a paired text-generation prompt that answers the same question without the component. The fallback is not "show error UI" — it's "stream the prose version."

Streaming considerations

streamUI (Vercel) commits to a component shape the moment the model picks a tool. The component skeleton renders immediately; data fills in as the generator yields. This is great UX when the model picks correctly and slow / janky when it doesn't.

Tradeoffs:

  • Commit early (streamUI default): instant component shell, perceived-fast, but if the tool call fails mid-stream, you've already shown a half-rendered card. Recovery is awkward.
  • Commit late (text-first with promotion): stream text until you have enough signal that a component is warranted, then swap. More forgiving, less crisp. The pattern OpenAI's Apps SDK uses — the model can return text and then attach a widget via metadata.
  • Optimistic skeletons: render the primitive's skeleton (table headers, card outline) before data lands. Cheap perceived-perf win. Works for both early and late commit.

Practical defaults:

  • Tools whose outputs are deterministic and bounded (a query result, a status lookup): commit early. The shape is known; the data is the only variable.
  • Tools that may not return data at all (search, recommendation): commit late, fall back to text when empty.
  • Anything that requires a second model call to compose: avoid streamUI for it. The latency kills the win.

Anti-patterns

  • Composing UI for every answer. Most answers are prose. If your portfolio of tools is 100% component-returning, you're forcing structure where structure isn't load-bearing — and inflating latency and failure surface for no UX win.
  • Rendering model output as raw HTML. "The model is smart, it'll output clean HTML" is how you get prompt-injection XSS. Schema-validate or sanitize, always.
  • No fallback path. A GenUX surface without a text fallback is a surface that silently breaks. Build the fallback before the component; ship them together.
  • Bespoke component per answer. Each new layout the model can compose is a new validation schema, a new design QA pass, a new mobile breakpoint, a new accessibility audit. Keep the primitive set small and stable.
  • Unbounded lists and tables. "The model will return a reasonable number of rows" — until it returns 1,000 and blows your layout. Cap at the schema.
  • Treating GenUX as "render the JSON the LLM returned." That's not GenUX; that's a data leak with extra steps. The component layer is yours; the model picks which one and with what data. It doesn't write the markup.
  • Hiding the text answer when a component renders. Users sometimes want to copy the answer as prose. Either keep a text version available (collapsed, alongside) or make the component itself copyable as structured text.
  • Letting the model invent affordances. "Click here to do X" inside the component, where the model wrote the handler — no. Affordances are wired by your code against your action set. The model picks from the action set; it doesn't define it.

Worked example

Scenario: a chat surface for a logistics company. A user asks: "What's the status of my orders from last week?"

Step 1 — Run the decision tree.

  • Gate 1 (list of comparable items)? Yes — multiple orders, same shape each. → Promote to a list-or-table primitive.

Decision: table. Multiple orders, scannable columns (id, date, status, total). Lists are for when items are richer than a row; this is rows.

Step 2 — Pick the primitive and write the schema.

const orderTableTool = createTool({
  description: "Show recent orders as a sortable table. Use when the user asks about order history, status, or recent purchases.",
  inputSchema: z.object({
    range_days: z.number().int().min(1).max(90),
  }),
  outputSchema: z.object({
    orders: z
      .array(
        z.object({
          id: z.string().regex(/^ORD-\d{6,}$/),
          placed_at: z.string().datetime(),
          status: z.enum(["pending", "shipped", "delivered", "cancelled"]),
          total_cents: z.number().int().nonnegative(),
        })
      )
      .max(20),
    truncated: z.boolean(),
  }),
  execute: async ({ range_days }) => {
    const rows = await db.orders.recent(userId, range_days);
    return {
      orders: rows.slice(0, 20),
      truncated: rows.length > 20,
    };
  },
});

Step 3 — Render with disclosure.

function OrderTable({ data }: { data: z.infer<typeof orderTableTool.outputSchema> }) {
  return (
    <div className="rounded-lg border">
      <table>
        <thead>
          <tr><th>Order</th><th>Placed</th><th>Status</th><th className="text-right">Total</th></tr>
        </thead>
        <tbody>
          {data.orders.map((o) => (
            <tr key={o.id}>
              <td><a href={`/orders/${o.id}`}>{o.id}</a></td>
              <td>{formatDate(o.placed_at)}</td>
              <td><StatusBadge status={o.status} /></td>
              <td className="text-right">{formatCents(o.total_cents)}</td>
            </tr>
          ))}
        </tbody>
      </table>
      {data.truncated && (
        <a href="/orders" className="block p-3 text-sm">Show all orders →</a>
      )}
    </div>
  );
}

Notice:

  • Headline first: the table is the answer. No preamble paragraph.
  • Capped at 20 at the schema level. If there are more, truncated: true triggers the "show all" link.
  • Action affordances are real links, not prose ("click here"). Each row links to the canonical order page.
  • Primary action is implicit (click a row). No button group competing with the table.

Step 4 — Validation gate.

The schema parses the model's tool output. If status comes back as "in_transit" (not in the enum), parse fails, fallback fires.

Step 5 — What happens on malformed response.

const result = orderTableTool.outputSchema.safeParse(modelOutput);
if (!result.success) {
  logValidationFailure({ tool: "orderTable", error: result.error });
  // Re-prompt for a prose answer to the same question
  return streamText({
    model,
    system: "Answer the user's question about their orders in 2-3 sentences. Mention order IDs and status. No markdown tables.",
    prompt: userMessage,
  });
}
return <OrderTable data={result.data} />;

The user sees either a clean table or a clean paragraph. They never see a half-broken component, an error toast, or "something went wrong." The fallback is the text answer to the same question — which is still useful.

What you instrumented: rejection rate per tool, time-to-first-paint for the component path, fallback rate. When the rejection rate spikes after a prompt change, you know which prompt change broke the contract before users notice.

v0.1 — what's coming

This is the v0.1. The patterns above are the public surface — the decision tree, the primitive set, the validation gate, the fallback path — synthesised from the publicly-documented landscape (Vercel AI SDK, Anthropic Artifacts, ChatGPT Canvas, OpenAI Apps SDK) and from GenUX principles that hold across stacks.

The production examples from the capybara project will deepen the layout primitives, streaming considerations, and worked example sections in v0.2 — with real schemas, real telemetry numbers (rejection rates, p50 stream timing, fallback rate), and the actual primitive inventory that shipped. The <!-- TODO(capybara): ... --> markers in the source mark the seven places where production content will land.

Until then, treat v0.1 as a decision framework, not a cookbook.

Further reading

  • Vercel AI SDK — Generative User Interfaces — the canonical tools + zod schema pattern, useChat integration, tool part rendering
  • Vercel AI SDK — Streaming React Components (RSC) — streamUI, the model-as-router mental model, generator-yielded intermediate states
  • Anthropic — Introducing Artifacts — the dedicated-surface pattern, when Claude promotes output to an Artifact vs inline
  • OpenAI Apps SDK examples — MCP servers returning widgets to ChatGPT, _meta.ui.resourceUri, setWidgetState, callTool
  • patterns.dev — AI UI Patterns — chat surface patterns, streaming, error handling
  • CopilotKit — Developer's Guide to Generative UI — the static / declarative / open-ended taxonomy of GenUI implementations
  • Vercel Academy — Multi-Step & Generative UI — multi-step tool calling that composes into GenUI surfaces
Repositorydarrenhead/skills
SourceSKILL.md
First publishedMay 23, 2026

Related skills

#SkillTags
  • 1
    SaaS startersaas-starter
    • SaaS
    • Next.js
    • Supabase
    • Starter
    • SaaS
    • Next.js
    • Supabase
    • Starter
  • 2
    Autorater rubricautorater-rubric
  • LLM-as-judge
  • Evals
  • Calibration
  • Methodology
  • LLM-as-judge
  • Evals
  • Calibration
  • Methodology
  • 3
    Multimodal structured extractionmultimodal-structured-extraction
    • Multimodal
    • Gemini
    • Zod
    • Extraction
    • Multimodal
    • Gemini
    • Zod
    • Extraction
  • 4
    Persona-aware disclosurepersona-aware-disclosure
    • Prompting
    • UX
    • Adaptive
    • System prompt
    • Prompting
    • UX
    • Adaptive
    • System prompt
  • 5
    Reddit pain miningreddit-pain-mining
    • Product discovery
    • Reddit
    • Validation
    • Research
    • Product discovery
    • Reddit
    • Validation
    • Research