Applied AI, UX Lead at Google
GenUX/UI · LLM Evals · Autoraters
Adapt the same factual answer for recruiter / engineer / executive audiences. Detect signals, branch the system prompt, preserve truth across reframings.
$ npx skills add darrenhead/skills --skill persona-aware-disclosureOne answer rarely fits everyone. A recruiter scanning your portfolio in 90 seconds between meetings does not want the same paragraph as a staff engineer pulling apart your architecture, who does not want the same paragraph as a VP deciding whether to fund the work. The facts are identical. The shape — vocabulary, depth, citation density, what you ask them to do next — is not.
Persona-aware disclosure is the pattern: detect which audience you are talking to, then route through a system prompt that wraps the same retrieved facts in audience-appropriate framing. It is what darrenhead.com does on its chat surface — the README lists it as one of the demonstrated capabilities ("Recruiter vs. technical reader get different shapes of answer"). The principle generalises to any AI app whose users span more than one role.
If you serve a single homogeneous audience, skip this skill. If your audience is split, not branching is a choice — and it usually means half your users get an answer that is either condescending or overwhelming.
These are the dials. You rarely move all of them. The skill is knowing which ones matter for the gap between your personas.
Vocabulary, jargon density, sentence length. A recruiter benefits from "we shipped an evaluation harness that grades model output". An engineer wants "autorater suite with rubric-based scoring, gated in CI". An executive wants "automated quality gate; cut release risk in half". Same fact. Three registers. Wrong register reads as either smug or patronising.
How many levels down do you go before stopping? Recruiter answers top out at what and why-it-matters. Engineer answers go to how and what-broke-when-we-tried-X. Executive answers compress to outcome and risk. Picking the wrong depth is the most common failure: engineers given a 60-word summary feel hand-waved at; recruiters given a 600-word architecture deep-dive bounce.
Engineers want links — to the repo, the ADR, the PR, the eval result. Recruiters want one or two anchor citations and prose. Executives want zero inline citations and a "details on request". Over-citing reads as defensive; under-citing reads as bluster. The retrieval layer fetches the same sources every time — the persona decides how many surface in the rendered answer.
The follow-up question, the button, the link. Recruiter answers end with book a call or see the case study. Engineer answers end with read the ADR or clone the repo. Executive answers end with here is the one-pager or who else has used this. The CTA is where the persona model actually pays for itself — it is what converts an answer into a next action.
What you do not change across personas. Names, numbers, dates, causal claims, anything load-bearing. Personas reshape; they never distort. If the engineer answer says "we cut p95 by 40%", the executive answer says "we cut p95 by 40%" — possibly without the "p95" — but the 40% is not negotiable. This axis is the one anti-patterns most often violate.
You have two clean options and one good hybrid.
Explicit (ask). An onboarding question, a role-selector chip, a sign-up field. High confidence, zero inference cost. Downsides: friction on the first interaction, and users sometimes lie or pick whatever lets them in fastest. Best when the persona is durable across sessions (B2B SaaS where each account has a known role).
Implicit (infer). Classify from the first message — vocabulary, question shape, referrer, account metadata. Zero friction. Downsides: classifier error means some users get the wrong branch, and the cost of being wrong is high (engineer routed to recruiter branch will disengage immediately). Best when you have strong priors (referrer = LinkedIn → likely recruiter; referrer = GitHub → likely engineer).
Hybrid (recommended). Infer a default from available signals, then offer a one-tap correction in the UI — a chip, a toggle, a follow-up question if confidence is low. This is what most mature implementations land on. You get zero-friction defaults and a cheap out when the model gets it wrong. Treat the correction as training data for the next iteration of the classifier.
Whatever you pick, persist the choice for the session and surface it back to the user. A persona that silently changes mid-conversation is a bug.
The factual content — what you retrieved, what you computed, what you know — is shared. Only the wrapping prompt branches.
type Persona = "recruiter" | "engineer" | "executive"
const PERSONA_PROMPTS: Record<Persona, string> = {
recruiter: `You are answering a recruiter or hiring manager. They have
~90 seconds and want signal, not depth. Lead with the outcome and the
role-relevant proof point. Use plain English; expand acronyms on first
use. Keep paragraphs to 2 sentences. Cite at most 2 sources inline,
preferably a case study or shipped product. End with ONE next-step CTA
inviting a conversation ("Want to see the case study?" / "Book a call?").
Do not go below the surface unless asked.`,
engineer: `You are answering a technical reader (staff/senior eng,
ML researcher, architect). They want the how, not the what. Use precise
technical vocabulary; do not expand standard acronyms. Show the
trade-off, not just the choice. Include links to the repo, ADR, or PR
for every load-bearing claim. Code blocks welcome. End with ONE next-step
CTA pointing to deeper material ("Read ADR-0007?" / "See the eval rig?").`,
executive: `You are answering a senior decision-maker (VP, director,
founder). They want outcome, risk, and cost. Compress aggressively —
3 short paragraphs max. No inline citations; offer a one-pager or
follow-up on request. Frame everything in business terms (revenue,
risk, time-to-ship, headcount). End with ONE next-step CTA inviting
a meeting or a written brief.`,
}
function buildPrompt(persona: Persona, retrievedFacts: string): string {
return `${PERSONA_PROMPTS[persona]}
# Facts to draw from
${retrievedFacts}
# Hard rules (all personas)
- Names, numbers, dates, and causal claims are identical across personas.
- Never invent a citation. If you have no source, say so.
- If the user's follow-up implies a different persona, surface the
mismatch and offer to switch.`
}
Two things to notice. First, the retrieval layer runs before the branch — every persona sees the same evidence. Second, the hard rules at the bottom are persona-invariant. That is where you protect signal integrity from rewrite drift.
The question: "What's the difference between fine-tuning and RAG?"
Recruiter branch.
Both are ways to make a general LLM useful for a specific job. RAG (retrieval-augmented generation) lets the model look things up in your documents at answer time — fast to ship, easy to update. Fine-tuning bakes new behaviour into the model itself — slower, more expensive, but better for changing how the model writes or thinks, not just what it knows. Most production systems start with RAG and only fine-tune when they have to.
→ Want to see how I built the RAG layer for this site?
Engineer branch.
RAG and fine-tuning solve different problems and compose well.
RAG is inference-time augmentation: embed the corpus, retrieve top-k chunks per query, stuff them into the context window, let the model generate over the union of its weights and the retrieved evidence. Cost lives in the embedding pipeline + vector store + the extra tokens per request. Wins: knowledge is hot-swappable, citations are trivial, factual drift is bounded by retrieval quality.
Fine-tuning mutates the weights — typically LoRA/QLoRA against a task-specific dataset. Cost lives in dataset curation + the training run + an eval rig to confirm you did not regress general capability. Wins: format adherence, tone, structured-output reliability, latency (no retrieval round-trip).
Heuristic: if the problem is "the model does not know X", reach for RAG. If it is "the model knows X but will not say it the way we need", reach for fine-tuning. They stack — fine-tune a small model on your domain's shape, then RAG against your live corpus for the content. See ADR-0003 for how I picked retrieval over fine-tuning for the portfolio chat layer.
→ Want to walk through the pgvector + Gemini embedding setup?
Executive branch.
Two ways to make an LLM useful for your business. Retrieval (RAG) plugs your documents in at query time — cheap, fast to update, low-risk. Fine-tuning rebuilds the model around your data — more expensive, slower to iterate, but stronger control over how the model behaves.
Most teams start with retrieval and only fine-tune when retrieval hits a ceiling. Time-to-first-value is weeks for retrieval, months for fine-tuning.
→ Want a one-pager on which to pick for your use case?
Same five facts in all three. Three different shapes. Each one ends with a CTA the audience actually wants to click.
autorater-rubric skill — once you branch by persona,
you need separate eval rubrics per branch. They ship together.