The standard corporate recipe
Open any enterprise L&D catalog or sales-enablement platform and you find the same four ingredients: a leaderboard, points, badges, and a streak. Engagement metrics go up on a dashboard. A quarterly report cites the program. Next fiscal year, the vendor pitches a refresh.
The sales gamification software market alone is projected to grow from $623M in 2025 to $949M by 2030 at an 8.8% CAGR, per a Research and Markets 2026 report. The category is large, profitable, and almost entirely built on the assumption that ranking, rewarding, and showcasing individual performance is what makes a game a game.
The research base has disagreed for at least a decade. The claim here is narrow: for durable team skills, narrative plus collaboration is a better evidence-supported bet than leaderboard-first design. Narrative has its own failure modes, named below. The default corporate recipe optimizes for the weakest mechanism in the literature.
What leaderboards actually do
A leaderboard is a ranked display of participant performance. The design assumption is that visibility of rank motivates effort.
The longest-running empirical test of that assumption in an educational setting is Hanus and Fox (2015), a 16-week classroom longitudinal study of undergraduate communication students. One section was gamified with a leaderboard and badge system. The other received equivalent content without the game layer. The gamified students showed lower motivation, lower satisfaction, and lower empowerment over time, and scored lower on the final exam. The mediation path was through intrinsic motivation: the game mechanics crowded it out rather than building on top of it.
That finding maps onto a much older, more replicated result. Deci, Koestner, and Ryan (1999) meta-analyzed 128 studies on extrinsic rewards and intrinsic motivation. Engagement-contingent tangible rewards on free-choice intrinsic motivation produced d=-0.40. Completion-contingent was d=-0.36. Performance-contingent was d=-0.28. The more the reward is tied to doing the activity, the more it erodes the underlying interest. Positive verbal feedback was the exception, with d=+0.33. Praise helps. Points and ranks, on average, hurt.
The classic demonstration is Lepper, Greene, and Nisbett (1973), the overjustification study. Preschoolers who liked drawing were randomly assigned to an expected-reward, unexpected-reward, or no-reward condition. Two weeks later, the expected-reward children spent about 50% less time drawing during free play than controls. The paid activity became the paid activity. The free one disappeared.
A leaderboard is a structurally similar intervention. It takes an activity that might have been intrinsically interesting and attaches a visible extrinsic contingency. For motivated learners, it is at best a neutral overlay. For unmotivated ones, it creates public evidence of where they rank, accelerating disengagement. Li et al. (2024), a systematic review of leaderboards in education, found that absolute leaderboards "risk discouraging lower-ranked students."
The narrative advantage in memory and learning
Swap the frame. Instead of ranking learners, give them a story to inhabit. The research on what that does to memory is older than the gamification literature, and more settled.
The cleanest early demonstration is Bower and Clark (1969). Participants were given lists of ten unrelated nouns and either memorized them by rote or chained them into a short narrative. On delayed recall, the narrative group recovered 93% of the words. The rote group recovered 13%. Roughly a sevenfold difference, produced by the act of imposing a story on otherwise arbitrary content.
The mechanism has been mapped at the neural level. LaBar and Cabeza (2006) summarized the evidence in Nature Reviews Neuroscience: emotional events have "privileged status in memory," mediated by amygdala-hippocampal interactions that prioritize consolidation. Tyng, Amin, Saad, and Malik (2017), in Frontiers in Psychology, concluded that "emotion facilitates encoding and retrieval." Bukhari and colleagues (2025), in Nature Human Behaviour, reported that emotional arousal during narrative perception predicts later recall fidelity via functional integration across multiple brain networks. When a learner is inside a story that makes them feel something, their brain recruits more of itself to encode the content, and the content persists.
This sits inside a larger frame. Bruner (1986) distinguished between the paradigmatic mode of thought (logical, propositional) and the narrative mode (particular, temporal, agentic). Training that lives entirely in the paradigmatic mode (bullet points, decision trees, competency matrices) leaves the narrative machinery of the brain unengaged. Green and Brock (2000) formalized the empirical side of this with narrative transportation theory: the more a reader is absorbed, the more their attitudes shift in line with the story's implicit claims, and the more the content is retained. Tulving (1972) drew the distinction between episodic memory (events, contexts) and semantic memory (facts, concepts). Stories live in the first register. Bullet points live in the second. The first register is stickier.
Mar and Oatley (2008) extend the argument into social cognition. Fiction functions as a simulation of social experience, giving readers low-cost practice at inferring intent, reading emotion, and predicting behavior. A learner inside a well-designed narrative is practicing the social cognition they need on Monday, not just absorbing a rule about it.
The gamification meta-analysis, read carefully
The most widely cited empirical synthesis on gamification is Sailer and Homner (2020), a meta-analysis of 128 studies in Educational Psychology Review. Headline effects are modest and positive: cognitive g=0.49, motivational g=0.36, behavioral g=0.25. The subtler finding is where the heat comes from.
Sailer and Homner tested which mechanics drove the behavioral results. The strongest moderators were game fiction and the combination of competition with collaboration. Pure competition did not drive the effect. The meta-analytic evidence says narrative context plus a cooperative-competitive blend outperforms individual ranking, and the industry has, on average, shipped individual ranking.
Hamari, Koivisto, and Sarsa (2014), in an earlier review of 24 gamification studies, had already concluded that "effects greatly depend on the context." The 2019 follow-up by Koivisto and Hamari, covering 819 papers, added the novelty-effect finding: perceived benefits of gamification decreased as time using the service increased. Leaderboard engagement is, on average, a wasting asset. Best in the first month, worst in the twelfth.
Gamification effects are real but moderate. The moderators that matter most are narrative framing and cooperative structure. The mechanic that dominates actual product design (the leaderboard) is the one with the weakest empirical support and the most documented side effects.
Narrative is not a free lunch
This is the part most pro-narrative arguments skip, and skipping it is where the case collapses under scrutiny.
Armstrong and Landers (2017) ran a well-designed experiment on exactly this question. Participants (N=273) received workplace training in one of three forms: plain text, gamified training with narrative elements, or a control. The gamified-with-narrative version produced higher learner satisfaction with a large effect (d=0.65). It also produced lower procedural knowledge than the control text, with d=-0.40 on the outcome that mattered most.
Learners liked the narrative version. They did not necessarily learn the procedure from it. The simplest explanation is that the story competed for cognitive resources with the skill, and without careful design the story won.
Wrapping a procedural skill in a narrative, without considered integration, can shift attention from the skill to the story. Armstrong and Landers is a caution against the naive "just add a story" position. The story has to be engineered so that moments of narrative engagement coincide with moments of skill practice, with no switching cost between them.
This does not rehabilitate the leaderboard. Armstrong and Landers did not find that a ranked, points-driven version of the same training would have outperformed the narrative one. Narrative can fail the same procedural-transfer test that pure competition more often fails. The design bar for either mechanism to produce durable skill is high. The research on which bar is easier to clear, at scale, for complex team skills, still points toward narrative-with-collaboration as the stronger starting frame.
Desirable difficulties and the direction of optimization
Bjork and Bjork (2011) synthesize several decades of work on desirable difficulties. Their argument: "conditions of instruction that temporarily slow immediate performance can enhance long-term retention and transfer." Interleaved practice, spaced retrieval, contextual variability, and productive struggle all look worse in the moment and better in the month.
Competitive gamification pushes in the opposite direction. Points reward fast answers. Leaderboards reward streaks of correct responses. Badges reward completion. The entire reward schedule is tuned to optimize short-term performance, which is precisely the signal the Bjorks identify as misleading. A program that makes people look great while using it and forget the content three weeks later is the predictable output of a mechanic selected for engagement metrics rather than retention.
Well-designed narrative systems create desirable difficulties by default. The learner has to hold the story state in mind, reason about ambiguous motivations, and integrate new information into a running model. That cognitive load looks inefficient relative to a flashcard drill. In the literature, it is the thing that produces the transfer.
Case-based and simulation evidence
Two adjacent literatures reinforce the pattern. McGaghie et al. (2011), meta-analyzing simulation-based medical education against traditional clinical instruction, found an effect size of 0.71 in favor of simulation. Medical education is a reasonable benchmark because the transfer criterion is procedural and consequential, not a quiz score. A 2025 meta-analysis of seminar-and-case-based versus lecture instruction in BMC Medical Education (16 RCTs, N=956) found case-based formats significantly outperformed lecture on knowledge and skill outcomes.
What simulations and case methods share with narrative gamification is specificity: a concrete situation with characters, stakes, and a decision point. They do not share a leaderboard. When the literature tests the mechanisms that consistently produce transfer, the frame is almost always "put the learner inside a situation and make them act."
What the data supports: narrative plus collaboration, competition as texture
The synthesis most consistent with the evidence is narrower than "narrative good, competition bad." It is closer to:
- Start with narrative as the frame. Learners need a situation to inhabit, not just content to absorb.
- Structure collaboration as the core mechanic. Team skills are built in teams. This matches Sailer and Homner's finding that cooperative-competitive blends outperform pure competition, and the broader simulation literature.
- Use competition as texture, not structure. Time pressure inside the scenario, resource constraints, a quest that can fail if the team does not coordinate. Competition against the environment or the problem, not a public ranking against colleagues.
- Use points and badges lightly, and prefer feedback that targets progress against a learning goal rather than position against peers. This is the Deci et al. "positive verbal feedback" exception, not a return to the standard leaderboard.
Prez et al. (2025) offers empirical support for exactly this blend in educational gamification. Narrative-first, competition as optional texture. Order matters.
Objection: "Competitive elements motivate top performers." Landers et al. do show that trait-competitive users engage more with leaderboards. Designing a training system around the 20% of learners already carrying their own motivation, at the cost of the 60% in the middle and the 20% at the bottom, is a regressive choice. The population the training is trying to reach is, by definition, the one not arriving pre-motivated.
Objection: "Gamification works short-term. Refresh the mechanics every quarter." Rodrigues et al. (2022) document a novelty effect followed by a familiarization increase after committed exposure. A coherent sustained system can recover and grow after the initial spike fades. The implication is to invest in a deep, narratively coherent system and give it time, rather than swap mechanics every quarter.
A nuance from Hanus and Fox: rewards can help bored or under-motivated students, particularly when participation is voluntary. Voluntary, opt-in training contexts have different motivation dynamics than mandatory ones. The narrative-first argument is strongest where participation is earned and light mechanics are allowed to do their work.
What this means for training design
Audit the mechanics. If a platform's core loop is rank, points, badges, streak, it is optimized for engagement-metric reporting, not skill transfer.
Ask what the learner is inside. A learner inside a story is practicing situated reasoning. A learner inside a quiz is practicing quiz-taking.
Make collaboration the core mechanic. If the training is meant to build team skills, the mechanic should require coordination. Solo quizzes with a shared dashboard are not team training.
Make competition part of the diegesis. Time pressure inside a mission, adversarial NPCs, resource scarcity, and fail states push effort without publicly ranking coworkers against each other.
Test for procedural transfer, not satisfaction. Armstrong and Landers is the cautionary tale. If the only measurement is "learners liked it," the design will converge on entertainment.
Give the system time. The first month is the worst month to judge from. Commit to a longer evaluation horizon and coherent content over quarterly mechanic resets.
The boundary between serious games and gamification matters here: when the scenario becomes the content rather than a wrapper around the content, the design pressure changes.
A deliberate-design example
QuestWorks is built on the research above. It is the flight simulator for team dynamics. Teams enter a cinematic, voice-controlled quest on QuestWorks' own platform, inhabit roles, coordinate under time pressure, and practice the social cognition their real work requires. The core frame is narrative. Collaboration is the required mechanic. Competition exists inside the story, against the encounter, not as a public ranking of colleagues.
QuestWorks works with Slack for install, invites, onboarding, HeroGPT coaching, leaderboards, and admin commands. The game itself runs on its own platform. Leaders see aggregate team trends plus strengths-based XP highlights per player. HeroGPT coaching conversations are private and are not shared upstream. HeroTypes are public the way a character class is public. Participation is voluntary and opt-in, and quests are not tied to performance reviews.
These design choices are deliberate outputs of the literature above. Narrative-first rather than rank-first. Collaboration as the core mechanic. Competition inside the diegesis. Strengths-based feedback rather than public ranking. Voluntary participation rather than mandatory rollout. Whether or not the product is the right fit for a given team, the design is an existence proof that the research-supported version of this category is buildable.
For why this generation of workers is unusually ready for narrative cooperative formats at work, see the RPG generation in your workforce. For the adjacent argument about what D&D players already know about teamwork, see what D&D players know about teams. For the deeper look at why play itself is a learning technology, see the science of learning through play.
The short version
Leaderboards, points, and badges are the dominant corporate gamification mechanics. They are also the weakest evidence-supported mechanisms in the literature for building durable skills. Narrative inside a cooperative structure, with competition used as texture, is the pattern the research actually supports, as long as the story is engineered to integrate with the skill rather than compete with it. The training industry has been selling the opposite package, at scale, for most of a decade. The people buying it should read the studies before their next renewal.