POST a conversation. Receive intelligence.
REST API. TypeScript SDK. Python SDK. OpenAPI 3.1. Webhook callbacks.
Send the whole conversation. BIE measures the human's side of it.
# ingest one conversation arc, one object per turn curl https://api.bieintel.com/api/v1/ingest \ -H "Authorization: Bearer $BIE_KEY" \ -H "Content-Type: application/json" \ -d '{ "deployment_id": "cx_chatbot_prod", "events": [ /* one object per turn, in order */ ], "webhook_url": "https://yourapp.com/hooks/bie" }'
Human turns, in context.
BIE does not grade the model's answers. Your eval stack already does that. BIE measures what those answers did to the person reading them.
You send the whole arc, every turn in order. The engine scores the human turns in that context: trust calibration, frustration buildup, dependency drift, silent abandonment risk.
{
"session_id": "c_4471",
"archetype": "cx_chatbot",
# one representative scored dimension
"layer_z": {
"trust_calibration": "at risk · drops sharply mid-arc"
# the full dimension set and any derived
# risk indicators are documented in the
# authenticated reference
},
"key_finding": "Trust collapses on turn 4 of
cancellation flow. The model offered policy
when the user wanted action.",
"falsifiable": true,
"recommendation": "Rewrite the cancel-flow opener
to acknowledge intent before explaining policy."
}Every Layer Z dimension resolves to one calibrated number on one scale, so a healthy deployment and a struggling one never read the same. Each dimension also points a fixed direction (higher is healthier for trust calibration, higher is worse for frustration buildup and dependency drift), so a 0.03 difference can't flip a label between reports.
Same payload, your language.
curl https://api.bieintel.com/api/v1/ingest \ -H "Authorization: Bearer $BIE_KEY" \ -d '{ "deployment_id": "cx_chatbot_prod", "events": [...] }'
Python: pip install bie · both wrap the same OpenAPI 3.1 surface.
Three endpoints. One schema.
| Method | Path | Purpose |
|---|---|---|
| POST | /v1/ingest | Ingest a conversation |
| GET | /v1/intelligence/:session_id | Retrieve scored intelligence |
| POST | /v1/drift | Compare two model versions |
| Field | Type | Notes |
|---|---|---|
| actor_id | string | Stable per actor; not PII |
| conversation_id | string | Groups the multi-turn arc |
| content | string | The message text |
| timestamp | ISO 8601 | UTC |
Did shipping v1.3 change behavior?
{
"before_label": "Model v1.2",
"after_label": "Model v1.3",
"question": "Did frustration buildup change
after we shipped v1.3?"
}Significant regression. Frustration buildup up 0.06 on the cancel flow specifically. Other flows unchanged.
Thin adapters for ingestion.
Pre-built connectors normalize source data into the same event schema. The engine is the product; connectors are how data gets in.
TypeScript and Python, both first-class, both generated against the same spec.
The full surface as a machine-readable contract. Generate your own client.
Scoped API keys, read-only or read-write. Query intelligence from your agent.