Big Picture 13 min read

How to Run a Retrospective That Changes Things

Debriefs improve performance by 25%. Most retros produce sticky notes that nobody acts on. Here is how to fix that.

By Asa Goldstein, QuestWorks

TL;DR

A meta-analysis of 46 studies found debriefs improve team performance by 25%. But most sprint retrospectives fail the follow-through test: roughly 40% of action items are never completed. The problem is not the format. The problem is structural: too many items, no owners, no review cadence, and dominant voices crowding out dissent. Six formats work. The one you rotate matters more than the one you pick. And post-experience debriefs (like post-quest reviews in QuestWorks) outperform scheduled calendar retros because the experience is still fresh.

The retrospective is the most widely practiced team dynamics intervention in software development. Every agile framework includes one. Scrum prescribes it at the end of every sprint. SAFe puts it at the end of every PI. Even teams that abandoned formal agile ceremonies years ago still run some version of "what went well, what didn't."

And yet, ask any team member whether their retros actually change anything, and you will get an uncomfortable pause. A 2022 survey by Parabol found that 35% of retrospective participants felt the meetings were a waste of time. Retrospectives do work. The research is clear on that. The problem is that most teams run them badly.

The Research: Debriefs Work (When Done Right)

Tannenbaum and Cerasoli conducted a meta-analysis across 46 studies involving 2,136 participants and found that debriefs improve effectiveness by approximately 25% on average (d = .67) (Tannenbaum & Cerasoli, 2013). The effect was consistent across simulated and real settings, medical and non-medical contexts, and both team-level and individual-level debriefs.

Three factors amplified the effect: alignment (the debrief focused on the same behaviors being measured), facilitation (a skilled facilitator guided the process), and structure (the debrief followed a defined format rather than open-ended discussion). The implication is that unstructured "how do you feel about the sprint" conversations leave most of the 25% improvement on the table.

Separate research from West (2000) on team reflexivity found that teams who regularly reflect on their processes and adapt them outperform teams that do not, with the effect being strongest in complex, uncertain environments (West, 2000). Software development qualifies as both complex and uncertain.

Six Retrospective Formats That Work

Format matters less than consistency. But rotating formats every 4 to 6 cycles prevents the autopilot effect, where teams generate the same observations repeatedly because the prompt never changes.

1. Start-Stop-Continue. Three columns: what should we start doing, stop doing, and continue doing? Simple, fast, works for teams new to retros. Risk: it can feel repetitive after a few cycles.

2. 4Ls (Liked, Learned, Lacked, Longed For). Adds nuance to Start-Stop-Continue by separating positive observations into what was enjoyed versus what was educational, and negative observations into what was missing versus what the team wishes existed. Good for teams that need to build a more reflective habit.

3. Sailboat. Visual metaphor: wind (what propels us forward), anchors (what holds us back), rocks (risks ahead), island (our goal). Works well for visual thinkers and distributed teams using Miro or a shared whiteboard. The "rocks" category is uniquely valuable because it surfaces forward-looking risks rather than just backward-looking problems.

4. Mad-Sad-Glad. Three emotional categories. This format works when the team has been through a difficult period and needs to acknowledge the emotional dimension of work. It gives permission to name frustration without it feeling like complaining. Use sparingly so it does not become therapy.

5. Starfish (More of, Less of, Keep doing, Start doing, Stop doing). An expanded version of Start-Stop-Continue that adds gradient. "Less of" and "More of" are useful when you do not want to fully stop or start something but want to shift the balance. Good for mature teams that have already addressed the obvious problems.

6. Timeline. The team plots the sprint on a timeline and marks high points, low points, and decision moments. Then they discuss what caused each. This format is best when a sprint had significant ups and downs and the team needs to understand the causal chain rather than just listing symptoms.

The Follow-Through Problem

Format selection is not where most retros fail. Follow-through is. Approximately 40% of retrospective action items are never completed, according to practitioner surveys across the agile community. The reasons are structural:

  • Too many action items. A retro that produces 8 action items will complete 2 of them. Cap at 2 to 3 items per retro, maximum. If the team cannot narrow the list, vote. The items that do not make the cut go on a parking lot for next time.
  • No clear owner. "We should improve our code review process" is not an action item. "Jordan will propose a new code review checklist by Thursday and share it in #engineering" is an action item. Every item needs a name and a date.
  • No review mechanism. The single most effective structural change is starting every retrospective by reviewing the action items from the previous one. Did they get done? If not, why? This creates accountability without requiring a separate tracking system.
  • Vague wishes instead of experiments. Frame action items as experiments: "For the next sprint, we will try X and evaluate whether it improved Y." This lowers the commitment threshold (it is an experiment, not a permanent policy) and makes success measurable.

Handling the Dominant Voice

In any group discussion, a small number of people generate a disproportionate share of the talking time. Research on participation inequality in meetings shows that in a typical five-person meeting, two people do 70% of the talking (Harvard Business Review, 2016). Retrospectives are no exception.

Three structural fixes work:

  • Silent writing first. Everyone writes their observations independently for 5 minutes before any discussion begins. This ensures that every voice generates at least one contribution regardless of speaking comfort level. Research on nominal group technique shows that groups generate more ideas and higher-quality ideas when they ideate individually before discussing collectively (NIH, PMC).
  • Round-robin sharing. Each person reads one observation in turn, cycling until all observations are shared. This prevents the loudest person from setting the agenda.
  • Dot voting. After all observations are visible, each person gets 3 votes to place on the items they consider most important. Discussion focuses only on the top-voted items. This democratizes priority-setting.

Async Retros for Distributed Teams

Retrospectives do not require synchronous meetings. For distributed teams across time zones, async retros can be more inclusive. The process: open a shared board (Miro, Parabol, EasyRetro) at the end of the sprint, give the team 24 hours to add observations, then run a 20-minute synchronous call to discuss only the top-voted items and assign action items.

The async phase produces better observations because people write thoughtfully rather than on the spot. The synchronous phase ensures the team builds shared understanding, which is the part that async alone cannot replace. A 2024 report from Buffer found that 75% of remote workers say async collaboration makes them more productive (Buffer, 2024). Retros benefit from the same principle.

Post-Experience Debriefs vs. Scheduled Retros

There is a timing dimension most teams miss. A retrospective at the end of a two-week sprint asks people to recall experiences from 10+ days ago. Memory decay means the most vivid observations come from the last 2 to 3 days, and earlier events get compressed or forgotten.

The Tannenbaum and Cerasoli meta-analysis found that debriefs aligned with the experience being reviewed produced stronger effects. In aviation, crew debriefs happen immediately after the flight. In military after-action reviews, they happen the same day as the operation. The principle is clear: the closer the debrief is to the experience, the more accurate the reflection.

This is the design principle behind post-quest debriefs in QuestWorks, the flight simulator for team dynamics. After every 25-minute quest, the team debriefs while the experience is fresh. What worked? Where did communication break down? Who stepped up, and what would we do differently? QuestDash surfaces the behavioral data to ground the conversation in evidence rather than memory. Leaders get aggregate trends and strengths-based XP highlights through a weekly team health report.

The cadence matters: weekly quest debriefs build the retrospective muscle so that sprint retros become more productive too. Teams that practice structured reflection frequently do it better than teams that practice it biweekly. HeroGPT provides private AI coaching in Slack that helps individuals process their own patterns between sessions. Everything runs on its own cinematic, voice-controlled platform, with Slack as the integration layer.

$20/user/month, 14-day free trial.

The Retrospective Checklist

Before your next retro, run through this:

  • Did you review last retro's action items at the top? (If not, why would anyone believe this retro's items will be different?)
  • Is there a silent writing phase before discussion? (If not, 2 out of 5 people will dominate.)
  • Is there a voting mechanism to prioritize? (If not, you will discuss whatever the loudest person raised first.)
  • Are you capping action items at 2 to 3? (If not, completion rate will drop below 50%.)
  • Does every action item have an owner and a date? (If not, it is a wish, not a commitment.)
  • Have you rotated the format in the last 6 cycles? (If not, the team is on autopilot.)

Retrospectives are the highest-leverage team practice in agile development. The research proves it. The follow-through problem is solvable. And teams that build the debrief habit through frequent, structured practice outperform teams that treat it as a sprint ceremony to endure.

Start a 14-day free trial.

Frequently Asked Questions

There is no single best format. Start-Stop-Continue works well for teams that need simple structure. The 4Ls (Liked, Learned, Lacked, Longed for) works for teams that want more nuance. The Sailboat works for visual thinkers. The best format is the one you rotate every 4 to 6 sprints so the team does not go on autopilot.

Roughly 40% of retrospective action items are never completed. The root causes are: too many items per retro (cap at 2 to 3), no clear owner assigned, items that are vague wishes rather than specific next actions, and no review mechanism at the start of the next retro.

A meta-analysis by Tannenbaum and Cerasoli across 46 studies found that debriefs improve team effectiveness by approximately 25% on average. The effect was consistent across simulated and real settings, medical and non-medical contexts, and team-level and individual-level debriefs.

Three structural fixes: silent writing before discussion (everyone writes their points before anyone speaks), round-robin sharing (each person shares one point in turn), and dot voting (each person gets 3 votes on the items they consider most important). The goal is to separate idea generation from idea discussion.

Yes. Async retrospectives let team members across time zones contribute thoughtfully. Open a shared board, give the team 24 hours to add observations, then run a 20-minute synchronous call for the top-voted items. QuestWorks post-quest debriefs are built-in retrospectives after every 25-minute session, providing weekly practice at structured team reflection.

Ready to Level Up Your Team?

14-day free trial. Install in under a minute.

Slack icon Try it free
The flight simulator for team dynamics Try QuestWorks Free