Applied AI, UX Lead at Google
GenUX/UI · LLM Evals · Autoraters
AI agents browse a real page with a goal, then rate the experience against UX dimensions. Each report is a video session replay — what the agent did, what it was thinking, and how it scored it.
AI Expense Tracking: Denpyo vs. Freee
Watch the replayUsability and Design Eval: denpyo.com
Watch the replayGoogle vs. ChatGPT: Finding Ramen in Shibuya
Watch the replayWikipedia — search & navigate task (cursor motion test)
Watch the replayStripe — Pricing task (cursor motion test)
Watch the replayStripe Landing Page Evaluation
Watch the replay