A measurement layer for AI agent persona consistency
Metryval is a framework-validated measurement instrument for discrete persona mapping. Built for teams shipping AI agents that need a human reference layer for sycophancy drift, persona collapse, and inter-rater agreement.
Synthesizes 36 peer-reviewed psychological frameworks + cites 9 additional frameworks as methodology precedents. Operationalizes the human reference layer that Anthropic's Persona Vectors paper (August 2025) names as the missing structural component for persona evaluation at scale.
The problem persona-evaluation teams keep hitting
Persona drift is observable. Sycophancy is observable. Persona collapse under load is observable. What teams keep finding is that the language for describing what changed has no shared reference frame. Two engineers watch the same trace, agree the agent's persona drifted, and have no agreed-upon coordinates for naming the direction or the magnitude. Internal taxonomies grow per-product, and they fail to generalize across agents, organizations, or evaluation cohorts.
The field has converged on three signals: behavioral logs, embedding-space probes, and human-rater agreement. None of these resolve to a shared discrete reference frame against which two raters at two organizations can produce comparable measurements.
What Metryval provides
Metryval is a 19-dimension personality measurement instrument operating across five domains in the PAREN architecture (Perception · Agency · Responsiveness · Engagement · Navigation). The instrument is administered as a self-report battery and produces a coordinate in 19-dimensional space. That coordinate places the respondent against 53 discrete reference patterns — empirically located clusters that serve as a shared coordinate frame for cross-cohort agreement.
Reference patterns are not categorical types. The framing is structural: the instrument measures a continuous 19-dimensional space, the 53 reference patterns are theoretical placements within that space against which any measured coordinate can be compared, and an output report names the nearest reference pattern + orbital patterns rather than asserting categorical assignment. This is the structural difference between Metryval and consumer typology instruments (MBTI / Enneagram / DISC) that fail psychometric replication.
Methodology
Metryval synthesizes 36 peer-reviewed psychological frameworks into the 19-dimension PAREN measurement instrument + cites 9 additional frameworks as methodology precedents. Per-framework academic-reception status is tracked in the public Framework Reception Ledger, updated on a six-month cadence and within thirty days of any major framework critique event (Polyvagal Theory was removed in May 2026 following the 39-expert consensus paper; MFT MFQ-30 was retired in May 2026 with items re-attributed to the Interpersonal Reactivity Index Empathic Concern subscale).
The instrument's scoring family is centroid-based distance in 17D — Euclidean at launch, Mahalanobis post-pilot when N ≥ 500 establishes the empirical covariance matrix. Scoring is deterministic and inspectable; no large language model sits in the scoring path. LLM assistance is used in narrative content generation only, with full transparency disclosure per California AB 2013.
Where this sits in the field
Existing AI-persona-evaluation tooling clusters into three lanes. The first is embedding-space introspection — useful for steering, weak for cross-cohort comparison. The second is behavioral telemetry — useful for drift detection, weak for naming what changed. The third is closed-source psychometric scoring at the individual level — useful when the underlying instrument is known, problematic when the framework synthesis is undisclosed.
Metryval is positioned as a fourth lane: a public-methodology, framework-validated, discrete-reference-pattern instrument that provides the shared coordinate frame the first three lanes cannot. The Framework Reception Ledger commitment is the structural differentiator: any team using Metryval as a reference layer can audit the framework canon at the level of individual scientific findings, and any framework critique that lands in the literature is reflected in the ledger within thirty days.
Use cases this is built for
Persona consistency evaluation across agent versions. Sycophancy drift measurement with a discrete reference frame. Inter-rater agreement on persona shifts across organizational boundaries. Pre-deployment persona-fit evaluation for agent personas authored against a target reference pattern. Post-deployment drift monitoring against the original target.
Metryval is not built for, and is explicitly excluded from, any use case in which the instrument's output is consumed as input to a personnel-selection decision about the respondent (hiring, promotion, termination, performance review). See the Policy Decisions Ledger entry P-01 for the full scope of the exclusion, the rationale, and the conditions under which it would be reviewed.
Whitepaper
A detailed whitepaper covering the framework synthesis, the 19-dimension instrument architecture, the 53-reference-pattern derivation, the scoring family, and the cross-cultural scalar invariance commitment is in preparation. The whitepaper will be available as an ungated download from this page — no email gate, no contact form, per the anti-extractive product posture.
Whitepaper — coming soon
Design Partner Program
Metryval is opening a small Design Partner Program for AI builders working on persona consistency, sycophancy evaluation, or agent persona-fit measurement. Design Partners receive: early access to the discrete-persona mapping API ahead of public launch; quarterly methodology briefings with the research lead; first-look on framework canon updates; and a direct line for falsification-interview feedback that shapes the public-API surface.
Design Partners are vetted for fit: the program is explicitly closed to use cases that fall under the Policy Decisions Ledger P-01 exclusion. Direct outreach via the contact link below.