Does gamification work at all in corporate training?

On average, yes, with modest effects. Sailer and Homner's 2020 meta-analysis of 128 studies found g=0.49 for cognitive outcomes, g=0.36 for motivational outcomes, and g=0.25 for behavioral outcomes. The more important finding was which mechanics drove the effect. Game fiction and a cooperative-competitive blend were the strongest moderators. Pure leaderboard competition was not. The meta-analytic case for gamification is real but rests on mechanics most corporate programs do not actually ship.

Why are leaderboards singled out as the problem?

Leaderboards combine three failure modes supported by separate literatures. Deci, Koestner, and Ryan's 1999 meta-analysis of 128 studies shows performance-contingent rewards reduce intrinsic motivation on free-choice measures. Hanus and Fox's 2015 longitudinal classroom study showed lower motivation, satisfaction, and final-exam scores in the gamified section over 16 weeks. Li et al.'s 2024 systematic review found absolute leaderboards risk discouraging lower-ranked learners. The overjustification effect from Lepper, Greene, and Nisbett in 1973 is the foundational result behind the pattern.

Is narrative a free lunch for training?

No. Armstrong and Landers in 2017 ran a well-designed experiment with 273 participants and found that gamified-with-narrative training boosted learner satisfaction with a large effect (d=0.65) but produced lower procedural knowledge than a plain-text control (d=-0.40). Narrative can compete with the skill for cognitive resources. The story has to be engineered so that moments of narrative engagement coincide with moments of skill practice. The argument for narrative is strongest when the narrative is integrated into the procedural content at every step.

What is the right mix of narrative, collaboration, and competition?

Narrative-first as the frame. Collaboration as the core mechanic when the goal is team skills. Competition as texture inside the story (time pressure, resource scarcity, adversarial scenario elements) rather than as a public ranking of colleagues. Sailer and Homner's meta-analysis supports combined competition-and-collaboration over pure competition for behavioral outcomes. Prez et al. in 2025 offers empirical support for the same blend in educational gamification. The order matters. Narrative first, cooperation as the structure, competition as flavor.

What should L&D leaders actually change tomorrow?

Three practical shifts. First, audit whether the core training loop is rank, points, badges, streak. If so, the program is optimized for engagement dashboards rather than skill transfer. Second, move to formats where learners inhabit a situation and coordinate with colleagues to resolve it, which is the pattern with the strongest simulation and case-method evidence. Third, measure procedural transfer on a longer horizon than the first month, because the novelty effect documented by Koivisto and Hamari makes the first month the worst month to evaluate from.

Narrative Beats Leaderboards in Training

The standard corporate recipe

Open any enterprise L&D catalog or sales-enablement platform and you find the same four ingredients: a leaderboard, points, badges, and a streak. Engagement metrics go up on a dashboard. A quarterly report cites the program. Next fiscal year, the vendor pitches a refresh.

The sales gamification software market alone is projected to grow from $623M in 2025 to $949M by 2030 at an 8.8% CAGR, per a Research and Markets 2026 report. The category is large, profitable, and almost entirely built on the assumption that ranking, rewarding, and showcasing individual performance is what makes a game a game.

The research base has disagreed for at least a decade. The claim here is narrow: for durable team skills, narrative plus collaboration is a better evidence-supported bet than leaderboard-first design. Narrative has its own failure modes, named below. The default corporate recipe optimizes for the weakest mechanism in the literature.

What leaderboards actually do

A leaderboard is a ranked display of participant performance. The design assumption is that visibility of rank motivates effort.

The longest-running empirical test of that assumption in an educational setting is Hanus and Fox (2015), a 16-week classroom longitudinal study of undergraduate communication students. One section was gamified with a leaderboard and badge system. The other received equivalent content without the game layer. The gamified students showed lower motivation, lower satisfaction, and lower empowerment over time, and scored lower on the final exam. The mediation path was through intrinsic motivation: the game mechanics crowded it out rather than building on top of it.

That finding maps onto a much older, more replicated result. Deci, Koestner, and Ryan (1999) meta-analyzed 128 studies on extrinsic rewards and intrinsic motivation. Engagement-contingent tangible rewards on free-choice intrinsic motivation produced d=-0.40. Completion-contingent was d=-0.36. Performance-contingent was d=-0.28. The more the reward is tied to doing the activity, the more it erodes the underlying interest. Positive verbal feedback was the exception, with d=+0.33. Praise helps. Points and ranks, on average, hurt.

The classic demonstration is Lepper, Greene, and Nisbett (1973), the overjustification study. Preschoolers who liked drawing were randomly assigned to an expected-reward, unexpected-reward, or no-reward condition. Two weeks later, the expected-reward children spent about 50% less time drawing during free play than controls. The paid activity became the paid activity. The free one disappeared.

A leaderboard is a structurally similar intervention. It takes an activity that might have been intrinsically interesting and attaches a visible extrinsic contingency. For motivated learners, it is at best a neutral overlay. For unmotivated ones, it creates public evidence of where they rank, accelerating disengagement. Li et al. (2024), a systematic review of leaderboards in education, found that absolute leaderboards "risk discouraging lower-ranked students."

The narrative advantage in memory and learning

Swap the frame. Instead of ranking learners, give them a story to inhabit. The research on what that does to memory is older than the gamification literature, and more settled.

The cleanest early demonstration is Bower and Clark (1969). Participants were given lists of ten unrelated nouns and either memorized them by rote or chained them into a short narrative. On delayed recall, the narrative group recovered 93% of the words. The rote group recovered 13%. Roughly a sevenfold difference, produced by the act of imposing a story on otherwise arbitrary content.

The mechanism has been mapped at the neural level. LaBar and Cabeza (2006) summarized the evidence in Nature Reviews Neuroscience: emotional events have "privileged status in memory," mediated by amygdala-hippocampal interactions that prioritize consolidation. Tyng, Amin, Saad, and Malik (2017), in Frontiers in Psychology, concluded that "emotion facilitates encoding and retrieval." Bukhari and colleagues (2025), in Nature Human Behaviour, reported that emotional arousal during narrative perception predicts later recall fidelity via functional integration across multiple brain networks. When a learner is inside a story that makes them feel something, their brain recruits more of itself to encode the content, and the content persists.

This sits inside a larger frame. Bruner (1986) distinguished between the paradigmatic mode of thought (logical, propositional) and the narrative mode (particular, temporal, agentic). Training that lives entirely in the paradigmatic mode (bullet points, decision trees, competency matrices) leaves the narrative machinery of the brain unengaged. Green and Brock (2000) formalized the empirical side of this with narrative transportation theory: the more a reader is absorbed, the more their attitudes shift in line with the story's implicit claims, and the more the content is retained. Tulving (1972) drew the distinction between episodic memory (events, contexts) and semantic memory (facts, concepts). Stories live in the first register. Bullet points live in the second. The first register is stickier.

Mar and Oatley (2008) extend the argument into social cognition. Fiction functions as a simulation of social experience, giving readers low-cost practice at inferring intent, reading emotion, and predicting behavior. A learner inside a well-designed narrative is practicing the social cognition they need on Monday, not just absorbing a rule about it.

The gamification meta-analysis, read carefully

The most widely cited empirical synthesis on gamification is Sailer and Homner (2020), a meta-analysis of 128 studies in Educational Psychology Review. Headline effects are modest and positive: cognitive g=0.49, motivational g=0.36, behavioral g=0.25. The subtler finding is where the heat comes from.

Sailer and Homner tested which mechanics drove the behavioral results. The strongest moderators were game fiction and the combination of competition with collaboration. Pure competition did not drive the effect. The meta-analytic evidence says narrative context plus a cooperative-competitive blend outperforms individual ranking, and the industry has, on average, shipped individual ranking.

Hamari, Koivisto, and Sarsa (2014), in an earlier review of 24 gamification studies, had already concluded that "effects greatly depend on the context." The 2019 follow-up by Koivisto and Hamari, covering 819 papers, added the novelty-effect finding: perceived benefits of gamification decreased as time using the service increased. Leaderboard engagement is, on average, a wasting asset. Best in the first month, worst in the twelfth.

Gamification effects are real but moderate. The moderators that matter most are narrative framing and cooperative structure. The mechanic that dominates actual product design (the leaderboard) is the one with the weakest empirical support and the most documented side effects.

Narrative is not a free lunch

This is the part most pro-narrative arguments skip, and skipping it is where the case collapses under scrutiny.

Armstrong and Landers (2017) ran a well-designed experiment on exactly this question. Participants (N=273) received workplace training in one of three forms: plain text, gamified training with narrative elements, or a control. The gamified-with-narrative version produced higher learner satisfaction with a large effect (d=0.65). It also produced lower procedural knowledge than the control text, with d=-0.40 on the outcome that mattered most.

Learners liked the narrative version. They did not necessarily learn the procedure from it. The simplest explanation is that the story competed for cognitive resources with the skill, and without careful design the story won.

Wrapping a procedural skill in a narrative, without considered integration, can shift attention from the skill to the story. Armstrong and Landers is a caution against the naive "just add a story" position. The story has to be engineered so that moments of narrative engagement coincide with moments of skill practice, with no switching cost between them.

This does not rehabilitate the leaderboard. Armstrong and Landers did not find that a ranked, points-driven version of the same training would have outperformed the narrative one. Narrative can fail the same procedural-transfer test that pure competition more often fails. The design bar for either mechanism to produce durable skill is high. The research on which bar is easier to clear, at scale, for complex team skills, still points toward narrative-with-collaboration as the stronger starting frame.

Desirable difficulties and the direction of optimization

Bjork and Bjork (2011) synthesize several decades of work on desirable difficulties. Their argument: "conditions of instruction that temporarily slow immediate performance can enhance long-term retention and transfer." Interleaved practice, spaced retrieval, contextual variability, and productive struggle all look worse in the moment and better in the month.

Competitive gamification pushes in the opposite direction. Points reward fast answers. Leaderboards reward streaks of correct responses. Badges reward completion. The entire reward schedule is tuned to optimize short-term performance, which is precisely the signal the Bjorks identify as misleading. A program that makes people look great while using it and forget the content three weeks later is the predictable output of a mechanic selected for engagement metrics rather than retention.

Well-designed narrative systems create desirable difficulties by default. The learner has to hold the story state in mind, reason about ambiguous motivations, and integrate new information into a running model. That cognitive load looks inefficient relative to a flashcard drill. In the literature, it is the thing that produces the transfer.

Case-based and simulation evidence

Two adjacent literatures reinforce the pattern. McGaghie et al. (2011), meta-analyzing simulation-based medical education against traditional clinical instruction, found an effect size of 0.71 in favor of simulation. Medical education is a reasonable benchmark because the transfer criterion is procedural and consequential, not a quiz score. A 2025 meta-analysis of seminar-and-case-based versus lecture instruction in BMC Medical Education (16 RCTs, N=956) found case-based formats significantly outperformed lecture on knowledge and skill outcomes.

What simulations and case methods share with narrative gamification is specificity: a concrete situation with characters, stakes, and a decision point. They do not share a leaderboard. When the literature tests the mechanisms that consistently produce transfer, the frame is almost always "put the learner inside a situation and make them act."

What the data supports: narrative plus collaboration, competition as texture

The synthesis most consistent with the evidence is narrower than "narrative good, competition bad." It is closer to:

Start with narrative as the frame. Learners need a situation to inhabit, not just content to absorb.
Structure collaboration as the core mechanic. Team skills are built in teams. This matches Sailer and Homner's finding that cooperative-competitive blends outperform pure competition, and the broader simulation literature.
Use competition as texture, not structure. Time pressure inside the scenario, resource constraints, a quest that can fail if the team does not coordinate. Competition against the environment or the problem, not a public ranking against colleagues.
Use points and badges lightly, and prefer feedback that targets progress against a learning goal rather than position against peers. This is the Deci et al. "positive verbal feedback" exception, not a return to the standard leaderboard.

Prez et al. (2025) offers empirical support for exactly this blend in educational gamification. Narrative-first, competition as optional texture. Order matters.

Objection: "Competitive elements motivate top performers." Landers et al. do show that trait-competitive users engage more with leaderboards. Designing a training system around the 20% of learners already carrying their own motivation, at the cost of the 60% in the middle and the 20% at the bottom, is a regressive choice. The population the training is trying to reach is, by definition, the one not arriving pre-motivated.

Objection: "Gamification works short-term. Refresh the mechanics every quarter." Rodrigues et al. (2022) document a novelty effect followed by a familiarization increase after committed exposure. A coherent sustained system can recover and grow after the initial spike fades. The implication is to invest in a deep, narratively coherent system and give it time, rather than swap mechanics every quarter.

A nuance from Hanus and Fox: rewards can help bored or under-motivated students, particularly when participation is voluntary. Voluntary, opt-in training contexts have different motivation dynamics than mandatory ones. The narrative-first argument is strongest where participation is earned and light mechanics are allowed to do their work.

What this means for training design

Audit the mechanics. If a platform's core loop is rank, points, badges, streak, it is optimized for engagement-metric reporting, not skill transfer.

Ask what the learner is inside. A learner inside a story is practicing situated reasoning. A learner inside a quiz is practicing quiz-taking.

Make collaboration the core mechanic. If the training is meant to build team skills, the mechanic should require coordination. Solo quizzes with a shared dashboard are not team training.

Make competition part of the diegesis. Time pressure inside a mission, adversarial NPCs, resource scarcity, and fail states push effort without publicly ranking coworkers against each other.

Test for procedural transfer, not satisfaction. Armstrong and Landers is the cautionary tale. If the only measurement is "learners liked it," the design will converge on entertainment.

Give the system time. The first month is the worst month to judge from. Commit to a longer evaluation horizon and coherent content over quarterly mechanic resets.

The boundary between serious games and gamification matters here: when the scenario becomes the content rather than a wrapper around the content, the design pressure changes.

A deliberate-design example

QuestWorks is built on the research above. It is the flight simulator for team dynamics. Teams enter a cinematic, voice-controlled quest on QuestWorks' own platform, inhabit roles, coordinate under time pressure, and practice the social cognition their real work requires. The core frame is narrative. Collaboration is the required mechanic. Competition exists inside the story, against the encounter, not as a public ranking of colleagues.

QuestWorks works with Slack for install, invites, onboarding, HeroGPT coaching, leaderboards, and admin commands. The game itself runs on its own platform. Leaders see aggregate team trends plus strengths-based XP highlights per player. HeroGPT coaching conversations are private and are not shared upstream. HeroTypes are public the way a character class is public. Participation is voluntary and opt-in, and quests are not tied to performance reviews.

These design choices are deliberate outputs of the literature above. Narrative-first rather than rank-first. Collaboration as the core mechanic. Competition inside the diegesis. Strengths-based feedback rather than public ranking. Voluntary participation rather than mandatory rollout. Whether or not the product is the right fit for a given team, the design is an existence proof that the research-supported version of this category is buildable.

For why this generation of workers is unusually ready for narrative cooperative formats at work, see the RPG generation in your workforce. For the adjacent argument about what D&D players already know about teamwork, see what D&D players know about teams. For the deeper look at why play itself is a learning technology, see the science of learning through play.

The short version

Leaderboards, points, and badges are the dominant corporate gamification mechanics. They are also the weakest evidence-supported mechanisms in the literature for building durable skills. Narrative inside a cooperative structure, with competition used as texture, is the pattern the research actually supports, as long as the story is engineered to integrate with the skill rather than compete with it. The training industry has been selling the opposite package, at scale, for most of a decade. The people buying it should read the studies before their next renewal.

Narrative Beats Leaderboards in Training

TL;DR

The standard corporate recipe

What leaderboards actually do

The narrative advantage in memory and learning

The gamification meta-analysis, read carefully

Narrative is not a free lunch

Desirable difficulties and the direction of optimization

Case-based and simulation evidence

What the data supports: narrative plus collaboration, competition as texture

What this means for training design

A deliberate-design example

The short version

Frequently Asked Questions

Ready to Level Up Your Team?

Narrative Beats Leaderboards in Training

TL;DR

The standard corporate recipe

What leaderboards actually do

The narrative advantage in memory and learning

The gamification meta-analysis, read carefully

Narrative is not a free lunch

Desirable difficulties and the direction of optimization

Case-based and simulation evidence

What the data supports: narrative plus collaboration, competition as texture

What this means for training design

A deliberate-design example

The short version

Frequently Asked Questions

Keep Reading

Ready to Level Up Your Team?