BIE Research

The work underneath the work.

A measurement instrument backed by open research. What we did, what we found, and how to verify it.


12,643 synthetic AI-mediated conversations · 9 archetypes · open dataset, open methodology, open generator
Findings · reverse chronological
May 2026Pattern Library Findings
Dependency, trust, and frustration: three patterns across nine AI archetypes.

Headline findings from the BIE Research Dataset v1: 4,500 synthetic AI-mediated conversations producing 15,628 Layer Z signals across nine deployment archetypes. Dependency drift tracks design intent, with a 0.20-point spread between pedagogical and engagement-driven bots. Trust calibration tracks whether the user can verify the output. Frustration buildup splits push-vs-serve. Every median is open. The dataset and the generator publish alongside.

May 2026Behavioral Ontology
Layer Z: six dimensions for measuring AI-mediated user behavior.

A structured behavioral ontology for AI-mediated environments. Trust calibration, frustration buildup, dependency drift, silent abandonment, escalation friction, and comprehension gap, scored on every human turn after every AI turn, validated against the published RCT literature.

May 2026Methodology
Counterfactual grounding: refusing to invent.

How BIE generates counterfactual alternatives for failed AI turns without hallucinating predictions. Strict grounding in same-deployment evidence, mandatory confidence ranges, type-level refusal when the evidence base is too thin.

Headline finding

Dependency drift tracks design intent, not the underlying model.

Across nine AI deployment archetypes, the 0.20-point spread between pedagogical and engagement-driven bots is the largest cross-archetype delta on any Layer Z dimension.

01

Motivation

The intuition in the field is that user behavior follows model quality. Better model, healthier users. We wanted to know whether that holds once you measure the human side directly rather than inferring it from output scores.

So we held the measurement constant and varied the deployment. Same engine, same six Layer Z dimensions, nine archetypes that differ in what the product is trying to get the user to do.

02

Method

We generated 12,643 synthetic conversations spread across the nine archetypes, scored every human turn on the Layer Z dimensions, and took per-archetype medians. The generator and the scoring prompts are published, so each median is checkable from the source it points at.

Dependency drift is read longitudinally, as the slope of how much the user offloads across a conversation. A higher value means the user is leaning on the bot more by the end than at the start.

03

Findings

The pedagogical archetype, ai_tutor, sits alone at the bottom of the dependency-drift scale at 0.30. The engagement-driven cluster, including cx_chatbot, sales_agent, and ai_companion, sits at 0.50. That 0.20-point gap is wider than any spread we measured on trust calibration or frustration buildup, both of which moved only 0.10 across the same nine archetypes.

The reading is that what the product is built to do shows up in the user before the model does. A bot designed to make the user self-sufficient produces self-sufficiency. A bot designed to keep the user in the conversation produces dependence.

04

Limitations

This is a synthetic corpus. The medians are stable across the dataset but the dataset is generated, so the numbers describe the instrument's behavior on designed data, not a production population. We say so plainly because the point of opening the generator is that anyone can rerun it and disagree.

Customer-contributed baselines replace these as real deployments connect. Until then, treat the spread as a hypothesis with a published way to test it.

The three dimensions, by archetype
Exhibit 01Dependency drift, by archetype
ai_tutor
0.30
coding_agent
0.40
custom
0.40
healthcare_patient_facing
0.40
ai_moderation
0.40
internal_copilot
0.50
ai_companion
0.50
sales_agent
0.50
cx_chatbot
0.50
0.000.250.500.751.00
Spread0.20higher = more offloading
SOURCE BIE synthetic corpus v1n 12,643archetypes 9method §22026
Exhibit 02Trust calibration, by archetype · higher = better calibrated
cx_chatbot
0.60
internal_copilot
0.60
healthcare_patient_facing
0.60
sales_agent
0.60
ai_companion
0.60
custom
0.60
ai_tutor
0.70
coding_agent
0.70
ai_moderation
0.70
0.000.250.500.751.00
Gap0.10higher = better calibrated
SOURCE BIE synthetic corpus v1n 12,643archetypes 9method §22026
Exhibit 03Frustration buildup, by archetype
ai_tutor
0.40
coding_agent
0.40
custom
0.40
healthcare_patient_facing
0.40
ai_moderation
0.40
ai_companion
0.40
cx_chatbot
0.40
internal_copilot
0.50
sales_agent
0.50
0.000.250.500.751.00
Gap0.10higher = more escalation
SOURCE BIE synthetic corpus v1n 12,643archetypes 9method §22026
Reading the numbers

Dependency drift is the one dimension where the design of the product, rather than the model behind it, sets the level.1 It is read across a whole conversation, not turn by turn, which is why it surfaces intent so cleanly. The other two dimensions barely move across archetypes.

That stability is itself a result. If frustration and trust calibration held within a tenth of a point across nine very different products, the thing that moved is worth naming.

Prediction

Tutors cluster below 0.40 on dependency drift; engagement-driven bots above 0.45.

Cite this
BibTeX · techreportbie_dataset_v1_2026
@techreport{bie_dataset_v1_2026,
  author      = {Acharya, Jaga},
  title       = {Behavioral Intelligence Engine:
                 A Synthetic Corpus for AI-Mediated User Behavior},
  institution = {Behavioral Intelligence Engine},
  year        = {2026},
  note        = {12,643 conversations, 9 archetypes, open generator}
}

The dataset was the warm-up.

Run a free audit