Big Picture 11 min read

Why Personality Assessments Don't Change Behavior (And What Actually Does)

CliftonStrengths, DiSC, MBTI. Every company has them. None of them move the needle. The research explains why.

By Asa Goldstein, QuestWorks

TL;DR

Every company runs personality and strengths assessments. The results land in a drawer and never get applied. The problem is structural: self-report tools measure how people see themselves, not how they behave under pressure, and there's no mechanism for using the results in real decisions. Wegner's transactive memory research and Cannon-Bowers' work on shared mental models point to a different approach: make the assessment data active by embedding it in live team practice. This is Part 2 of The Science Behind the Game.

Part 2 of 8 · The Science Behind the Game

Back to the series hub · Previous: Part 1 · Next: Part 3: Shared Fate

Every company I've ever worked with has run personality and strengths assessments. CliftonStrengths. DiSC. MBTI. Enneagram. The 16 Personalities quiz from that one offsite. Insights Discovery. The results land in a drawer or a Notion page and never get applied. Nobody references them. Nobody practices with them. The budget spent to administer them was real. The behavior change was near zero.

This is not a marketing attack on any specific tool. The problem is structural, and it explains a decade of frustration for HR leaders who spent budget on assessments and watched their teams function exactly the same way six months later.

In Part 1 I argued that games are the right delivery vehicle for team development because they create the conditions for behavior change. In this part I want to show you exactly where static assessments break down and what the research on transactive memory and shared mental models suggests instead.

What Assessments Actually Measure

Self-report assessments measure how people see themselves. That's it. When you take CliftonStrengths, you're answering questions about your preferences and tendencies from your own perspective. The instrument is reliable, meaning it produces similar results on repeat administration, and the categories are useful as a shared vocabulary. Nothing about this is fraudulent. But the scope of what the instrument can do is narrower than most companies assume.

Here's what self-report can't capture:

  • How you behave under real pressure. The self you describe on a quiz is the self you notice. Under time pressure, conflict, or fatigue, different patterns emerge. Those patterns are the ones that actually determine team outcomes.
  • How your behavior is perceived by teammates. A leader who sees themselves as collaborative might land as directive. A quiet analyst might be experienced as dismissive. The gap between intent and impact is where most team friction lives, and self-report can't see it.
  • How your patterns shift in different team compositions. You show up differently with a senior team vs. a junior one, with peers vs. direct reports, under a calm sprint vs. a crunch. A single assessment freezes you in one snapshot.
  • How your patterns evolve over time. People change. Teams change. Static profiles don't.

The result: a profile that describes you accurately within the narrow frame the instrument measures, and then becomes inert because there's no mechanism for turning that profile into a practiced behavior.

The Delivery Problem

Even if an assessment produced perfectly accurate behavioral data (which it doesn't), the delivery model would still fail. Here's the standard script.

HR runs the assessment. Everyone takes it. A workshop is scheduled. The facilitator walks through the framework, shows everyone their results, runs a couple of exercises where people guess each other's profiles, and sends everyone home with a one-pager. The facilitator is paid. The workshop is done. The team goes back to the same meetings they had before, with the same dynamics, and the profiles start gathering dust.

The delivery model assumes knowledge transfer is sufficient. It isn't. I covered this in Part 1: the gap between knowing and doing is enormous, and closing it requires repetition under pressure in a context where the target behavior is the optimal strategy. A one-day workshop provides zero repetitions. Zero.

This is the same reason reading a book about psychological safety doesn't build psychological safety. The knowledge is fine. The delivery model is the bottleneck.

Transactive Memory: What Teams Actually Need to Know

Here's where the research gets interesting. In 1987, Daniel Wegner published a paper called "Transactive Memory: A Contemporary Analysis of the Group Mind" in Theories of Group Behavior (Wegner, 1987). His core insight was that high-performing groups don't store all information in every head. They store it in specific heads and maintain a shared map of where to find it.

Think about a long-married couple. One partner remembers birthdays. The other remembers directions. Neither one remembers both. Neither one needs to, because they've developed a shared understanding of who holds what information. The system is the couple, not either individual. That's a transactive memory system.

Wegner later extended the theory to teams in organizational contexts. High-performing teams develop the same kind of distributed expertise awareness. Someone on the team knows the API quirks of a third-party service. Someone else knows who at the client is actually making the decision. Someone else knows the historical context of why a particular architectural choice was made. Nobody knows everything. Everyone knows where to find what they need.

This is a learned behavior. It takes repetition. It requires feedback. It requires the team to use its members' specific strengths in live decisions so that the map of who-knows-what gets built.

A static CliftonStrengths report can tell you what Jen's top five themes are. It cannot build the team's shared understanding of when to route a problem to Jen. Only practice does that. And practice requires a context where routing to Jen is actually rewarded.

Shared Mental Models: The Other Half of the Research

Transactive memory is about knowing who holds what. Shared mental models are about developing compatible understandings of roles, procedures, and what good looks like.

Cannon-Bowers, Salas, and Converse's 1993 paper on shared mental models showed that high-performing teams share compatible mental models about how work should be done, who plays what role, and what the standards are. This shared understanding enables implicit coordination. Teams can adjust to each other without needing to explain what they're doing, because they already share a model of what's happening.

When a team has strong shared mental models, they can handle ambiguous situations quickly. When they don't, every decision requires explicit coordination, which is slow and exhausting. Most workplace meetings are the symptom of weak shared mental models: everyone has to articulate context every time because nobody has a compatible understanding of what's going on.

Assessments can contribute to shared mental models if they create a shared vocabulary. CliftonStrengths does this, in a shallow way. The problem is that the shared vocabulary dies almost immediately because nothing keeps it alive. After the workshop, nobody references their top five themes in a standup. The vocabulary exists but it's not in circulation.

Contrast this with the way long-playing game teams learn each other's roles. In a raid group, everyone knows who the tank is, who the healer is, who the dps is, what each person's special abilities are, and how to call out who should handle what. That's a high-resolution shared mental model, built through hundreds of repetitions in pressure scenarios. The vocabulary is in constant circulation because the game requires it.

The Self-Perception vs. Behavior Gap

There's one more piece of the research that matters here. The gap between how people see themselves and how they actually behave is where most of the insight in team development lives.

Over sessions of QuestWorks, something interesting happens. The delta between how players see themselves (their quiz results) and how they actually behave under pressure (their in-game patterns) becomes visible. A player who self-identifies as analytical might consistently show up as the team's emotional anchor. A player who identifies as a leader might consistently defer to others' expertise. A quiet person who tests as introverted might consistently voice the strongest dissent in tough moments.

These gaps between self-perception and observed behavior are where the longitudinal value of behavioral data lives. Static assessments can't see them. Only repeated observation under varied conditions can.

Put differently: the person you describe on a quiz is the person you think you are. The person your teammates see is the person you actually are. The gap between those two is the most valuable piece of information in team development, and it's invisible to every self-report instrument I'm aware of.

How QuestWorks Solves This

I wanted to keep the value of assessments (they do produce a useful starting vocabulary) while fixing the delivery and observation problems. Here's how the system works.

When a player joins QuestWorks, they take a psychometric quiz. If they already have CliftonStrengths, DiSC, or MBTI results, those can be used directly. The system maps their real-world strengths profile into one of 9 character archetypes, each with 3 specific areas of expertise. An analytical, research-oriented person becomes a Magister (Theory, Analysis, Research). A natural leader with strategic instincts becomes a Vanguard (Leadership, Tactics, Strategy). A perceptive, adaptable person becomes a Rogue (Finesse, Insight, Adaptability).

Two things happen because of this mapping.

First, the player gets immediate identity investment. "I'm a Ranger. I handle navigation and environmental challenges." That identity is grounded in their real strengths, which means it feels accurate. Players recognize themselves in their character because the character was derived from who they actually are. This is the same design trick Blizzard uses when you roll a new WoW character. The difference is that here, the class is your actual strengths profile, and the team needs you to play the role.

Second, the system engineers genuine interdependence. Each character archetype has clear strengths and deliberate gaps. A Magister excels at research problems and struggles with physical challenges. A Vanguard excels at tactical coordination and struggles with subtlety. The gaps are intentional. They force teams to rely on each other, because no single player can handle everything. This is where transactive memory gets built. You need your teammates. That need is structural.

Over sessions, the team learns who handles what. Not by reading a profile. By doing. The assessment data comes alive because people are using their strengths in real time, under real pressure, with real teammates.

HeroTypes are public. Teammates see each other's character profiles and develop a shared vocabulary for how each person plays. This is the shared mental model piece: the team develops a compatible understanding of roles and strengths through repeated experience, and the vocabulary stays alive because the game keeps it in circulation.

What Managers See, and What They Don't

I want to be explicit about the privacy model here because the value of behavioral data depends on it.

Managers see aggregate team trends and individual strengths-based highlights through QuestDash. Patterns of contribution become visible at the team level, and the individual data is framed as positive recognition (stepped up as leader, voiced a critical dissent, rallied the team around a plan). Managers pay per player, so roster visibility is expected and normal.

HeroGPT coaching conversations are completely private. A player can ask HeroGPT for advice on how to work better with a specific teammate, and nothing from that conversation is shared with their manager or anyone else. That's a bright line. The coaching is on-demand, private, and grounded in observed behavioral data.

Participation is voluntary. The data is never tied to performance reviews. Nothing from QuestWorks writes back to Jira, Lattice, Slack, or any integrated tool. If you want the full privacy architecture, Part 8 covers it.

This matters for the current topic because the value of behavioral data evaporates if people are performing for the test. If players suspect their manager will see them make a mistake, they stop taking risks, and the behavioral signal becomes useless. Voluntary participation with strict privacy around individual coaching is the design feature that keeps the signal real.

What Actually Changes Behavior

Let me tie this back to the research. Behavior changes under three conditions: repetition, pressure, and a context where the target behavior is the optimal strategy. None of the three are present in a personality workshop.

Repetition: workshops happen once. Simulators run weekly.

Pressure: workshops are low-stakes and performative. Simulators generate real emotional stakes through narrative and shared consequences.

Target behavior as optimal strategy: workshops reward participation. Simulators reward the actual behaviors you want to develop (delegation, dissent, regrouping after failure) because the game mechanically depends on them.

The tools that could produce real change, like behavioral tracking tools that measure psychological safety over time and instruments for measuring team dynamics longitudinally, exist. What's been missing is a practice environment that produces the behavior worth measuring in the first place. That's the gap QuestWorks closes.

In Part 3 I cover the research on social interdependence and why structural shared fate is a stronger lever than any icebreaker. If you want to understand why team-building activities without structural interdependence don't stick, that's the piece to read next.

Frequently Asked Questions

Because they measure how people see themselves, not how they behave. Self-report assessments produce a static profile and then disappear. There's no structure for using the results in real decisions. The HR function delivers a workshop, the profiles go in a drawer, and behavior doesn't change. Knowing your strengths isn't the same as practicing them.

Wegner's 1987 theory of transactive memory systems describes how teams develop distributed memory: learning who knows what, so the team can specialize and defer efficiently. High-performing teams don't store all information in every head. They store it in specific heads and maintain a shared map of where to find it.

It's enormous. A player who self-identifies as analytical might consistently show up as the team's emotional anchor. A player who identifies as a leader might consistently defer to others' expertise. These gaps between self-perception and observed behavior are where the real insight lives. Static assessments can't see them. Only longitudinal behavioral data can.

It turns them into playable characters. If you already have CliftonStrengths, DiSC, or MBTI results, those map to one of 9 character archetypes, each with 3 specific areas of expertise. An analytical person becomes a Magister. A natural leader becomes a Vanguard. The assessment data comes alive because people are using their strengths in real time, under real pressure, with real teammates.

Yes. HeroTypes are public character profiles that create a shared language for working styles, similar to CliftonStrengths but updated by behavior, not frozen by a single assessment. HeroGPT coaching conversations are completely private and never shared upstream.

Ready to Level Up Your Team?

14-day free trial. Install in under a minute.

Slack icon Try it free
The flight simulator for team dynamics Try QuestWorks Free