01
Building user trust in AI-assisted expense management.
Two rounds of user testing proved the supportive tone was right. What they didn't prove was durability. Users liked the first interaction — but trust over time requires something more structural than good intentions.
Expanion is an AI assistant embedded in the expense submission form. It watches the user type, analyzes the entry in near real time, and responds with suggestions — categories, mismatches, evidence from past expenses.
Two test sessions surfaced a consistent gap: users wanted to see the reasoning, not just the outcome. And a confidence indicator labelled "high / medium / low" raised more questions than it answered.
"Confidence in what, exactly?"
Test user — Part 2 session. The observation that drove every change in Part 3.The revision mapped every user observation to a design principle, and every principle to a concrete UI element. Nothing changed unless it solved something.
Usability Testing
Wizard of Oz + hi-fi prototype, 2 sessions
Synthesis
Mapped failures to trust dimensions
Wireframing
Card template, modal, and variant flows
High Fidelity
Full interface, 3 variants, dark + light
Design Review
Each change traced to course principles
Card template — consistent across all severity levels
Transparency modal — every suggestion can be fully inspected
Two-panel layout — form left, AI right, 600 ms debounce
Control / automation dial — three distinct variants
Trust over time requires that a system's reasoning be inspectable, its behaviour consistent, its deference calibrated, and its learning visible. Each mechanism below maps to a principle and to a specific part of the interface.
The "Why this suggestion?" link on each card launches a modal showing exactly what Expanion read, how it ran the check step by step, and a confidence breakdown per factor — replacing the single opaque "high/medium/low" label from Part 2.
Every card uses the same template: status stripe, title, body, action. The AI check fires 600 ms after the last keystroke — not on every character. Green always means confirm, yellow always means nudge. The user learns the grammar once and reads faster from then on.
Expanion can prefill fields, but only the user can press Submit. When confidence on a specific factor is low, Expanion says so in the form of a question, not a statement. The system earns its keep by knowing when to step back.
When a user disagrees with a flag and taps "Not a mistake?", that correction is stored and shown back the next time a similar entry would have been flagged — "last week you told me to stop flagging this." Trust builds when users see the system learning from them.
Each screen translates one of the four trust principles into a specific interaction. The design uses the same visual grammar throughout so users learn to read it once.
The modal replaces the single "high / medium / low" label from Part 2 with a factored breakdown. Users now see exactly what Expanion is confident about — and where it's asking them to contribute.
The step-by-step section shows the literal evidence: the keywords parsed, the past expenses consulted, the approval rates. Users don't have to trust the system blindly — they can inspect its reasoning on any individual call.
"Confidence in what, exactly?"
This question, from a test user in Part 2, drove the entire modal design.Three changes from Part 2: Submit is disabled (not just warned), the mismatch comparison is side-by-side instead of buried in prose, and the fix is a separate action card so the user can accept without re-reading the evidence.
The happy path is intentionally quiet. When everything checks out, Expanion confirms briefly and gets out of the way. The green card appears, the Submit button is active, and the user continues in one tap.
The nudge and evidence cards are present but subordinate — not alarms, just context. The warm gradient submit button signals readiness without demanding attention.
The visual grammar the user learned from the catch state now pays off: green stripe = safe, yellow = nudge, blue = evidence. No re-learning needed.
There is no single right point on the control / automation dial. Making it adjustable is itself a trust mechanism — it says the system knows you're more comfortable in some places than others.
The user fills every field. Expanion watches but never touches the form — it shows small hints indicating what category teammates typically pick for similar descriptions. Every decision is explicitly human.
Best for: new hires learning policy, audit-sensitive categories.
Expanion prefills the category when confidence is high, then flags the auto-decision in a green card. Every auto-fill is labeled as auto, surfaced, and reversible. The AI acts — then invites verification.
The 90% case. Transparent delegation with no penalty for overriding.
Drop a receipt. Expanion reads the merchant, amount, and category from the image and offers a one-click submit. Only runs when there is photographic evidence to reason from. Every output is undoable.
Best for: receipt-based expenses on the go. Risk: ambiguous receipts may go unreviewed.
The revision proved that inspectability, consistency, and visible learning are not optional features. They are what separates a tool users adopt from one they tolerate.
The most important lesson was about the relationship between emotional direction and functional trust. Part 2 proved users liked the supportive tone — they responded positively to an AI that sounded like a helpful colleague. What it didn't prove was durability.
Trust over time requires something more structural than good intentions. The four mechanisms — transparency, predictability, appropriate reliance, and feedback loops — give the supportive tone something to stand on.
The three-variant design acknowledges a second truth: there is no single right point on the control/automation dial. Some users, some categories, and some moments call for more human control — not less.
Making the dial adjustable is itself a trust mechanism. It tells the user: we know where you are more comfortable than we do. That's the real answer to the instructor's note about strengthening user confidence — in both how the system communicates, and the reliability of its outputs.