User behavior analytics for AI products

Watch the human, not the model.

The engine reads every conversation your AI has and tells you what your users are doing on the other side. You read what landed in your inbox, and you decide what goes out.

No login. CSV or JSONL. About thirty minutes to your inbox.
Backed by open research, not customer logos
12,643
conversations
9
deployment archetypes
15,628
Layer Z signals
3
patterns surfaced

We ran the engine on a published dataset across nine deployment archetypes and opened all of it: the findings, the methodology, and the generator that produced the data. Every median is checkable from the source it points at.

Read the findings Methodology Generator on GitHub
Where we fit

Eval tools measure the model. BIE measures the user.

Your evals measure the model. Your CX dashboards count tickets. We measure what your AI is doing to the people on the other end. A different layer than your stack, not a replacement for it.

Exhibit 01The model layer vs. the user layer
What your stack already measures
Faithfulness
Hallucination rate
Toxicity benchmarks
Latency, cost per turn
Resolution rate, CSAT
What no one is watching for
Trust calibration
Frustration buildup
Dependency drift
Silent abandonment
Escalation friction
Layer Z · the measurement lives on the human sideThe model's turn is context, not the verdict
What evals don't catch

Five products that passed every test.

Every dashboard reported healthy. The user-side reality diverged.

01560,000 mental-health emergencies inside one product. Per week.OpenAI · Oct 2025
02Volume metrics fine. CSAT on hard interactions tanked.Klarna rollback · May 2025
03Senior developers 19% slower. They thought they were 20% faster.METR · Jul 2025
04Every internal eval passed. The model praised "shit on a stick."GPT-4o sycophancy · Apr 2025
05A tribunal ruled the airline owed what its chatbot promised.Air Canada v. Moffatt · Feb 2024
What lands in your inbox

A headline you can't dismiss.

The claim, the observation that would falsify it, and the one thing to ship this week. It reads like a senior analyst wrote it, because the engine is built to argue, not to chart.

Prediction · falsifiable

If nothing changes, this flow loses measurable CSAT before the week is out, and you will see the first sign well before then.

BIE · User Health ReportWk 17 · Apr 21–27

Refund flow

cx_chatbot · production · 7-day window
drift detected

Trust came apart partway through the duplicate-charge flow this week. The bot doubled down with confident misinformation, and most of those conversations ended without a fix and resurfaced as tickets days later.

Trust calibrationdown sharply
Frustration buildupescalating, compounding across turns
Silent abandonment8.2%
Recommended action

Reroute refund queries past turn 3 to human escalation until the prompt is corrected.

BIEverified
SOURCE deployment cx_chatbotn 412 sessions/dayrun 4F2B8
What lands in your stack

Four surfaces. One engine.

01 · Weekly · per deployment
The User Health Report
Reads like a senior analyst wrote it.
02 · Real-time
The Live Behavioral Console
Watch the trust break, the turn it breaks on.
03 · Daily · projection-ready
The Boardroom
One number. Ninety days. Three reasons.
04 · Read-only · any MCP client
The Analyst MCP Seat
Interrogate it from anywhere. Conversationally.
The audit is free

Show us your AI. We'll show you what it's doing to people.

Run a Free Behavioral Audit See pricing
up to 10K conversations · about 30 min to your inbox · no card