EVIDENCE
Paper 06 · For Sceptics, Finance & Procurement

Measuring
What Matters

What empathy assessment still isn't capturing — and why the gap between what we measure and what actually changes behaviour costs organisations their entire empathy training investment.

6,000+
studies reviewed in a 2024 PLOS One systematic review of empathy measurement
0
instruments identified as both psychometrically robust and comprehensive
45+
years since the IRI was introduced — still the most widely used empathy measure in practice
Why Measurement Matters for Organisations

You can only measure what your instruments are capable of measuring

Organisations that commission empathy assessments, design training programmes, and measure outcomes are only measuring what their instruments are capable of measuring. If those instruments have significant gaps — and the evidence reviewed in this paper demonstrates that they do — then the investment may be considerably less effective than the data suggests. Worse, it may be optimising for the wrong thing entirely.

When an organisation uses a pre/post empathy assessment to evaluate a training programme, it is asking: did this intervention produce a measurable change in empathy capability? If the instrument does not measure the dimension of empathy that the training was designed to develop, the answer it returns is meaningless. Not wrong — meaningless. It is measuring something real, just not the right thing.

The absence of a gold standard for empathy measurement is not a minor technical inconvenience. It means that every empathy training programme built on a self-report baseline is measuring the dimension of empathy that people can most easily misrepresent — and ignoring the dimension that most directly determines whether they behave differently when it matters.

Systematic reviews consistently find no gold standard. Researchers consistently identify the gap between what instruments measure and what empathy requires. The field knows the instruments are incomplete. It has continued to use them because the alternatives are harder, slower, and more expensive to develop. That situation is now commercially costly in a way it was not before.

What the Dominant Tools Measure

Three instruments dominate empathy assessment. Each has strengths. Each has documented limits.

A fair account of these tools requires acknowledging both. They are not bad instruments. They are incomplete ones — and that incompleteness has a direct and costly consequence for organisations that use them as the basis for investment decisions.

IRI · Davis, 1980
Interpersonal Reactivity Index
The most widely used general-purpose empathy measure in existence. Well-constructed, freely available, genuinely multidimensional. Its longevity is a mark of real quality — used in thousands of studies across psychology, medicine, and organisational research.
↳ Self-report only. No correlation with actual empathic accuracy. Physical dimension entirely absent. Measures what people believe about their own empathy, not what they do.
EQ · Baron-Cohen & Wheelwright, 2004
Empathy Quotient
Developed at Cambridge, originally to study empathy differences in adults with Asperger syndrome. Now widely used in occupational and clinical settings. Reasonable internal consistency and discriminant validity for its original purpose.
↳ Designed for clinical floors, not development ceilings. Self-report bias. Physical dimension absent. Insensitive at the higher end of the distribution where most development work operates.
RMET · Baron-Cohen et al., 2001
Reading the Mind in the Eyes Test
A performance-based measure rather than self-report — a genuine methodological advance. Participants identify mental states from photographs of the eye region of faces. Sidesteps the social desirability problem. Cited over 2,000 times.
↳ Poor internal consistency. Static images only. Measures recognition, not response. Nearly 25% of test items fail original validation criteria. Not designed as an organisational measure.
Instrument What It Measures Dimension Coverage Key Limitation
IRI (1980) Self-reported cognitive and emotional empathic tendencies across four subscales
✔ Cognitive ✔ Emotional ✘ Physical
Self-report only; no significant correlation with actual empathic accuracy; physical dimension entirely absent
EQ (2004) Self-reported cognitive drive and affective response to others' emotional states
✔ Cognitive ✔ Emotional ✘ Physical
Designed for clinical floors; self-report bias; poor sensitivity at development range; physical dimension absent
RMET (2001) Ability to identify mental states from photographs of the eye region of faces
✔ Cognitive (recognition) ✘ Emotional ✘ Physical
Static images; poor internal consistency; measures recognition not response; no ecological validity for real interaction
Perth Empathy Scale (2023) Cognitive and affective empathy across positive and negative emotional valence
✔ Cognitive ✔ Emotional ✘ Physical
Most recent and rigorous self-report scale; still self-report; physical dimension not addressed
The Physical Dimension Gap

The dimension that determines whether empathy changes behaviour is not being measured

What Is Being Measured
Conscious, above-threshold processes
All three dominant instruments address dimensions of empathy that are accessible to conscious reflection and self-report. They measure:
  • Cognitive inference — understanding someone's perspective
  • Emotional recognition — identifying affective states
  • Empathic concern — reported tendency toward compassion
  • Imaginative projection — placing oneself in another's situation
These are real dimensions of empathy. The instruments that measure them are not wrong — they are partial. They measure what can be accessed through conscious reflection. The problem is that the most foundational dimension of empathy operates before reflection is possible.
What Is Not Being Measured
The body's pre-conscious attunement
Physical empathy — the somatic attunement that operates before conscious interpretation — is systematically absent from every available measurement tool. It would need to assess:
  • Accuracy in reading postural and physical cues in real interaction
  • Ideomotor responsiveness — the body's automatic mirroring
  • Quality of physical co-presence in actual conditions
  • Sensitivity to change over a training programme timeline
No existing instrument measures this. The absence is not accidental — measuring physical empathy is genuinely difficult. It requires detecting somatic responses that operate below the threshold of conscious awareness, in real interaction, under real conditions.
"Most existing measures of empathy rely on self-reports of dispositional tendencies or assess subjective or physiological responses to static images; consequently, they fail to assess the ability to monitor rapidly changing social cues, a skill that is very important in navigating real-life social interactions."
Zaki & Ochsner, 2011 · cited in multiple systematic reviews
The Consequences for Training Design

Measurement shapes training — and incomplete measurement produces incomplete training

When organisations design empathy development programmes, they design them to produce measurable outcomes. What is measurable determines what is prioritised. What is prioritised determines what gets developed. If the available instruments do not measure the physical dimension, the physical dimension does not become a programme objective.

1
Cognitive-only training produces cognitive-only change
A 2024 systematic review and meta-analysis of 50 workplace empathy training interventions found that the overwhelming focus of implemented training methods was on cognitive-verbal techniques. Physical empathy methods were absent from the reviewed interventions. Participants who complete well-designed reflective practice programmes do understand others' perspectives more consciously. What they do not reliably develop is any new physical responsiveness.
2
The ROI measurement problem compounds the training problem
If the primary outcome instrument does not capture the dimension of empathy that the training was designed to develop, ROI calculations are built on incomplete data. A programme that produces significant change in physical empathy capability but shows modest change on an IRI pre/post assessment will appear to have limited impact — generating precisely the scepticism that leads organisations to deprioritise investment in the dimension of development most likely to produce durable behaviour change.
3
The self-report ceiling cannot be solved by better items
Physical attunement is a bodily process that precedes cognitive interpretation. Self-report instruments, by definition, can only access what is available to conscious reflection. No matter how carefully a self-report scale is constructed, it cannot measure what happens in the body before the mind has time to engage. Research on empathic accuracy is unambiguous: self-report empathy scores show weak or absent correlation with performance on actual empathic accuracy tasks. People in general have little meta-knowledge regarding their empathic ability.
What Stuart Nolan Consulting Is Developing

Three lines of methodological development — and how to get involved at an early stage

This gap became evident through the training work itself. Participants in physical empathy programmes consistently report changes they cannot fully articulate — a shift in how they are with other people, a new quality of attunement, a change in what they notice and how they respond. These are real changes. The neural circuits underlying physical attunement are plastic. The problem is not that the training doesn't work. The problem is that no instrument currently exists to measure what it produces.

Development Line 01
A performance-based assessment for physical empathy
Distinct from self-report — designed to assess the physical attunement dimension of the Threefold Model. Measures the ability to read postural and physical cues in real interaction, ideomotor responsiveness to another's physical state, and the quality of physical co-presence. Because it is performance-based, it sidesteps the social desirability problem entirely. Validation approach uses populations who have and have not completed physical empathy training as the criterion groups.
Status: Active development
Development Line 02
An organisational Empathy Audit
An organisational diagnostic that aggregates individual empathy profiles into a team and organisational capability map. Gives L&D directors and HR leads a dimensional picture of empathy capability — not an average score, but a map showing where in the Threefold Model different parts of the organisation are strong and where development is most needed. Supports targeted programme design rather than generic empathy awareness training.
Status: Available now as early-partner engagement · from £4,500
Development Line 03
Passive behavioural signals from AI interactions
An earlier-stage exploration of whether patterns in how people interact with AI systems can provide a passive behavioural signal related to empathy capability. The theoretical foundation is established: the Media Equation (Reeves & Nass, 1996) demonstrates that people respond to AI systems as social actors, triggering real social behaviours. Whether variation in those responses can complement the Empathy Audit is being actively explored.
Status: Research direction · not yet a current product
What a Complete Assessment Would Look Like

Four criteria any valid empathy instrument must meet — and why none currently meets all four

The design principles for a complete empathy audit are clear from the literature, even if a fully validated instrument does not yet exist. A valid empathy assessment needs to meet four psychometric criteria that current tools repeatedly fail to satisfy.

1
Responsiveness to change
The instrument needs to be sensitive enough to detect genuine development over the timescale of a training programme — typically weeks or months. A 2021 systematic review of nursing empathy measures found that responsiveness was tested in only three of the instruments examined. For organisations evaluating training, this is the criterion that matters most. Most tools were not designed with it in mind.
2
Ecological validity
What the instrument tests needs to correspond to what empathy looks like in the settings where it matters — management conversations, team dynamics, client relationships, high-pressure decision-making. Static photographs of eye regions, questionnaire items about hypothetical scenarios, and controlled laboratory tasks do not meet this criterion. The gap between what the RMET tests and what empathy requires in practice is wide, as Baron-Cohen himself acknowledged in the original 2001 paper.
3
Resistance to social desirability
The instrument needs to produce scores that reflect actual capability rather than self-presentation. Empathy is widely understood to be a positive quality, and people reliably present themselves as more empathic than they are. Performance-based tasks have a significant advantage over self-report here — they measure what someone can do, not what they say about themselves. The Perth Empathy Scale (2023) is the most recent attempt to address this within the self-report format, but remains constrained by the format's structural limits.
4
Relevance to business outcomes
The scores need to predict something that organisations care about — adoption rates, retention, engagement, conflict frequency — so that investment in development can be connected to measurable returns. An instrument that produces psychometrically defensible scores but cannot be linked to operational outcomes will not support the business case for empathy development. This is the criterion that the entire current measurement landscape fails to meet reliably.
"The question is not whether empathy can be measured. The question is whether we are measuring what empathy actually does."
Stuart Nolan · Stuart Nolan Consulting

Get ahead of the measurement gap before your
competitors do.

Founding Partner organisations receive early access to the proprietary Empathy Audit methodology, genuine influence over its development, and permanently preferred pricing. Download the full white paper for the complete review of measurement tools, the systematic review evidence, and the development roadmap. Or book a 30-minute discovery call.

Download White Paper PDF Book a Discovery Call