Which OD framework has the strongest empirical evidence base?

Hackman's 5 Conditions has the strongest validation of any team-effectiveness framework on the common list. Wageman, Hackman, and Lehman's 2005 paper in the Journal of Applied Behavioral Science validated the Team Diagnostic Survey on 2,474 respondents across 321 teams. Edmondson's psychological-safety construct (the durable finding underneath Google's Project Aristotle) has its own strong empirical base from the 1999 Administrative Science Quarterly paper and many replications since.

Is the 70% of change efforts fail statistic real?

No. Mark Hughes's 2011 paper in the Journal of Change Management traced the five most commonly cited sources of the 70% figure and found none of them were backed by valid empirical evidence. The most likely origin is an off-hand comment in Hammer and Champy's 1993 book Reengineering the Corporation. The stat has been repeated for three decades without a dataset behind it. Change is hard, and the specific number is folklore.

How should I choose between the ten OD frameworks for a specific team problem?

Match the framework to the symptom. Use Hackman's 5 Conditions for chronic underperformance in a structurally correct team. Use GRPI to triage a stuck team along goals, roles, processes, and interpersonal. Use Edmondson's psychological-safety scale for climate issues. Use Tuckman for new-team vocabulary. Use 7S or Burke-Litwin for reorgs and acquisitions. Use ADKAR for individual-level resistance to a specific rollout. Use Kotter for large multi-year transformations. Use Lewin as a reminder that refreezing the new state is a distinct job from moving to it.

What newer OD approaches are worth adding to a modern practice?

Four stand out. Behavioral-science-informed choice architecture, which designs the environment so the desired behavior is the default. Organizational Network Analysis, which moves past self-report by using communication metadata to map actual collaboration patterns. Simulation-based practice, where teams rehearse coordination skills under realistic constraints with immediate feedback. And the Team Diagnostic Survey by Wageman, Hackman, and Lehman, which remains the most empirically grounded general-use instrument for team effectiveness.

OD Frameworks for Team Performance

Q: Is Google's Project Aristotle peer-reviewed research?

No. Project Aristotle was a two-year internal Google people-analytics study covering 180 teams, published on the company's re:Work site in 2015. Google did not release the dataset, the instruments, or the analytic methods in enough detail for independent replication. The psychological-safety factor that drew the most attention is, however, peer-reviewed and validated independently by Amy Edmondson's 1999 work and subsequent research. When practitioners cite Aristotle, the defensible citation underneath is Edmondson's.

A manager calls a consultant. The consultant has a framework.

Two VPs at similar companies both notice their teams are underperforming. Both hire organizational development consultants. The first consultant runs a McKinsey 7S diagnostic and recommends restructuring reporting lines. The second runs a Hackman team-effectiveness interview and recommends fixing how the team is bounded and coached. Same symptom, radically different intervention, and neither VP can fully articulate why their consultant reached for the framework they did.

Organizational development is a toolbox with about ten durable frameworks inside it, each with a different pedigree and a different evidence base. Some have been validated in peer-reviewed studies with thousands of respondents. Some are commercial methodologies never independently tested. A couple were thought experiments canonized by textbooks. Each framework gets a walkthrough here, with what it claims and what the research actually supports.

Whole-organization diagnostic frameworks

1. McKinsey 7S (1980)

The 7S model proposes seven interdependent elements that determine organizational effectiveness: Strategy, Structure, Systems, Style, Staff, Skills, and Shared Values. The idea is that changing one element without aligning the others produces drift. It was introduced by Waterman, Peters, and Phillips in Business Horizons in 1980 and popularized through In Search of Excellence. McKinsey still publishes an Enduring Ideas summary of the model.

Where it helps: as a checklist to surface misalignment across strategy, operations, and culture during a reorg or acquisition. Honest critique: 7S has no underlying theory of how the seven elements causally interact. It is a McKinsey consulting heuristic that became ubiquitous. The CIPD 2022 evidence review on people performance does not identify whole-model empirical validation of 7S. Use it to frame a conversation. Do not cite it as proof of causal mechanism.

2. Burke-Litwin Causal Model (1992)

Burke and Litwin's 1992 Journal of Management paper lays out twelve interdependent organizational dimensions, with transformational factors (external environment, leadership, mission, culture) driving transactional factors (structure, systems, climate, motivation, individual needs, performance). The distinguishing claim is that transformational change requires work on transformational variables first.

Where it helps: diagnosing why a technically correct change initiative failed to produce the intended performance lift. Honest critique: the full model has rarely been tested end-to-end in a single study. Most uses in the wild are partial, where a practitioner picks two or three dimensions and calls it Burke-Litwin.

Team-stage frameworks

3. Tuckman Stages (1965/1977)

Forming, storming, norming, performing, and (later) adjourning. Tuckman's 1965 paper in Psychological Bulletin synthesized 50 studies into a four-stage sequence, and Tuckman and Jensen in 1977 added the fifth stage. It is the most widely taught team model in corporate training.

Where it helps: giving managers a vocabulary for what a new team is going through. Honest critique: the 1965 paper was a literature review without an empirical test of its own. Bonebright's 2010 review in Human Resource Development International concluded that the model's durability rests mainly on its accessibility, with thin empirical rigor underneath. Real teams rarely move through five clean stages, and many skip or recycle stages under time pressure.

4. Drexler-Sibbet Team Performance Model

Developed by Allan Drexler, David Sibbet, and Russ Forrester, the Drexler-Sibbet Team Performance Model lays out seven stages: Orientation, Trust Building, Goal Clarification, Commitment, Implementation, High Performance, Renewal. It is a practitioner framework published and taught by The Grove Consultants International.

Where it helps: structured facilitation of team launches or restarts in consultant-led workshops. Honest critique: there is no peer-reviewed validation of the Drexler-Sibbet model. Treat the seven stages as design choices, useful as a facilitation scaffold.

Team-effectiveness diagnostic frameworks

5. Hackman's 5 Conditions (2002)

Richard Hackman's Leading Teams (HBS Press, 2002) argues that team effectiveness is determined by five enabling conditions: a real team with clear boundaries and stable membership, a compelling direction, an enabling structure, a supportive organizational context, and available expert coaching. The model is explicit that leaders set the conditions under which performance becomes likely, without directly producing it themselves.

Where it helps: diagnosing chronic underperformance in a team that appears to have the right people. Often the missing ingredient is structural (unclear boundaries, split reporting, no real direction), and the interpersonal layer is downstream of that.

Honest critique: Hackman has the strongest empirical foundation of any team framework on this list. The Team Diagnostic Survey, developed by Wageman, Hackman, and Lehman in the 2005 Journal of Applied Behavioral Science, was validated on 2,474 respondents in 321 teams and measures the five conditions directly. If you are picking one framework for evidence-based team diagnosis, this is the defensible choice.

6. GRPI (Rubin, Plovnick & Fry, 1977)

Goals, Roles, Processes, Interpersonal. Introduced in Rubin, Plovnick, and Fry's 1977 McGraw-Hill book Task-Oriented Team Development, GRPI proposes a diagnostic hierarchy: most team dysfunction is about unclear goals first, unclear roles second, unclear processes third, and only finally about interpersonal friction. The implication is that teams in conflict usually have a structural problem upstream.

Where it helps: fast diagnostic when a team is visibly stuck. Honest critique: the CIPD 2023 evidence review on high-performing teams does not identify a whole-model test of GRPI. The individual components (clear goals, role clarity) have strong support in group-dynamics research. The four-level causal ordering is an assertion.

7. Google's Project Aristotle (2012-2015)

Google's internal people-analytics team studied 180 teams over two years and concluded that five factors predicted team effectiveness: psychological safety, dependability, structure and clarity, meaning, and impact. Psychological safety was the most important. Findings were published on re:Work.

Where it helps: as a conversation starter inside technical organizations that trust Google's methodology. Honest critique: Project Aristotle was an internal company study and has never been published as peer-reviewed research. Google did not release the dataset, instruments, or analytic methods in enough detail for independent replication. The psychological-safety construct itself IS validated: Amy Edmondson's 1999 Administrative Science Quarterly paper developed and tested the construct in 51 work teams, with many replications since. When practitioners cite Project Aristotle, the durable finding underneath is Edmondson's. See how psychological safety is built through play for the practice side.

Change-management frameworks

8. Lewin 3-Step Change (1947)

Unfreeze, move, refreeze. Kurt Lewin's 1947 paper in Human Relations is the ur-text of planned organizational change: stable behavior patterns must first be destabilized, then shifted, then restabilized at a new equilibrium.

Where it helps: as a reminder that behavior change requires three distinct jobs (making the old state uncomfortable, providing a new target, and locking the new state in through structure and ritual). Honest critique: Lewin is often dismissed as outdated linear thinking. Burnes's 2004 paper in the Journal of Management Studies defends Lewin, showing his work was explicitly about field dynamics and continuous iteration. If you cite Lewin, cite Burnes with him.

9. Kotter 8-Step (1995/1996)

John Kotter's 1995 HBR article "Leading Change: Why Transformation Efforts Fail" (UCSF PDF copy) laid out eight steps that became the best-selling book Leading Change: creating urgency, forming a coalition, developing vision, communicating, empowering action, generating short-term wins, consolidating gains, and anchoring new approaches in culture.

Where it helps: as a checklist for large transformation programs where sequencing matters. Honest critique: Kotter has no randomized controlled trials behind it. The Appelbaum et al. 2012 systematic review in the Journal of Management Development concluded the model is well-structured and practitioner-friendly while lacking direct empirical support. Kotter's evidence base was observation of "over 100 companies," never documented as a formal sample. Reasonable to use as long as you do not dress it up as science.

10. ADKAR (Hiatt, 2006)

Prosci's ADKAR model breaks individual change into five outcomes: Awareness, Desire, Knowledge, Ability, Reinforcement. For a change to land, each individual affected has to pass through all five states.

Where it helps: rolling out a specific system change (new CRM, new performance process) where resistance clusters around one or two outcomes. Honest critique: ADKAR is proprietary to Prosci with no independent psychometric validation. Prosci's own Best Practices in Change Management study (25 years, 10,800+ practitioners, 2,600+ in the latest wave) is the largest public practitioner dataset and a useful reference for what change practitioners actually do, while carrying the obvious conflict that Prosci sells its own methodology.

Mapping framework to problem

A framework is a lens. Pick the lens that matches the symptom.

Symptom: the team looks structurally correct and remains chronically underperforming. Reach for Hackman 5 Conditions. The question you want to answer is whether the team is a real team (bounded, stable, interdependent), whether it has a compelling direction, and whether the surrounding context supports it. See also what makes a high-performing team for the evidence behind the condition-setting view.

Symptom: the team is stuck and you cannot tell why. Use GRPI as a triage tool. Walk goals first, then roles, then processes, then interpersonal. Most of the time the problem is upstream of the interpersonal surface.

Symptom: psychological climate is off. Measure psychological safety directly using Edmondson's 7-item scale. The Project Aristotle frame is a reasonable conversation starter, and the underlying validated instrument is Edmondson's.

Symptom: a new team is forming and the manager is anxious. Tuckman gives a shared vocabulary. Do not treat the four stages as a timeline. Use it to normalize that the middle weeks are often ugly and that is not, by itself, evidence of a failing team.

Symptom: a reorg, acquisition, or new operating model is being rolled out. McKinsey 7S surfaces which of the seven elements the change addresses and which it neglects. Burke-Litwin adds that transformational change needs transformational inputs (direction, leadership, culture) alongside transactional ones.

Symptom: individual-level resistance to a specific change. ADKAR tells you whether a given employee is stuck at Awareness, Desire, Knowledge, Ability, or Reinforcement, and each blocker has a different intervention.

Symptom: a large multi-year transformation program needs a sequencing map. Kotter 8-Step is a serviceable checklist. Do not cite it as proven. Use it to pressure-test whether the program has paid attention to urgency, coalition, vision, communication, empowerment, short-term wins, consolidation, and anchoring. For the operating-system view of keeping a team on track day to day, see the team management operating system piece.

Symptom: planned change is failing to stick. Lewin's unfreeze/move/refreeze reminds you that "move" without "refreeze" produces regression. The refreeze job is the unglamorous work of changing structures, reporting lines, metrics, rituals, and incentives so the new behavior becomes the path of least resistance.

The honest evidence review

The evidence behind the frameworks is uneven. A summary of what peer-reviewed and independent synthesis sources say:

Strongest empirical foundation: Hackman 5 Conditions, via the Wageman, Hackman, and Lehman 2005 Team Diagnostic Survey validation (n=2,474, 321 teams). Edmondson's psychological-safety construct has its own deep empirical base and is often the operationally usable piece of Project Aristotle.
Defensible with caveats: Lewin (Burnes 2004 defense of the model against modern critics). Components of GRPI (goals, roles, processes individually have strong support, though the whole-model hierarchy does not).
Widely used, empirically thin: Tuckman (1965 was a literature review, Bonebright 2010 concluded popularity outpaces rigor). McKinsey 7S (CIPD 2022 did not find whole-model validation). Kotter 8-Step (Appelbaum et al. 2012 found no direct empirical support).
Proprietary or practitioner-only: ADKAR (Prosci proprietary, no independent psychometric validation). Drexler-Sibbet (Grove practitioner model, no peer review).
Internal company study with no peer review: Project Aristotle (Google did not share the data or methods in reproducible form).

None of this means the thinner frameworks are useless. It means that when a consultant cites one as proven, ask for the study.

The "70% of change efforts fail" myth

Almost every change-management pitch deck opens with the claim that 70% of organizational change efforts fail. The figure is cited so often it has achieved the status of common knowledge. It is unsupported by data.

Mark Hughes's 2011 paper in the Journal of Change Management traced the five most commonly cited sources of the 70% figure and found that none of them are backed by valid empirical evidence. The most likely origin is an off-hand observation in Hammer and Champy's 1993 Reengineering the Corporation, which was never itself a study of change failure rates. The statistic has been passed around for three decades without a dataset behind it.

You can still believe that organizational change is hard. You cannot quantify it at 70% and claim the number is earned. When you see the stat in a proposal, note that the consultant either does not know its provenance or is hoping you do not.

What modern OD practice is adding

The ten frameworks above were developed between 1947 and 2015. Several newer approaches are pulling practice forward.

Behavioral-science-informed nudge design. Drawing on Thaler's 2017 Nobel-cited work, modern change programs treat choice architecture as a first-class tool: designing the environment so the desired behavior is the default.

Organizational Network Analysis (ONA). ONA moves past self-report by using communication metadata (email, calendar, chat patterns) to map actual collaboration networks. It surfaces brokers, bottlenecks, and siloed clusters. The method has the standard privacy trade-offs and is only as honest as the rollout.

Simulation-based practice. The older training model, built on lectures and case studies, has always struggled with transfer: knowing what a framework says does not produce behavior change. Simulation formats put teams in realistic constraints with immediate feedback, mirroring how pilots build competence in a flight simulator. QuestWorks sits in this category and calls itself the flight simulator for team dynamics. It is a cinematic voice-controlled platform where teams run cooperative scenarios with real stakes and time pressure. It works with Slack for install, invites, onboarding, HeroGPT coaching, leaderboards, and admin commands, is voluntary and opt-in, and is untied to performance reviews. Leaders see aggregate team trends plus strengths-based XP highlights per player. The design maps to Hackman's 5 Conditions (a real team with compelling direction, enabling structure, supportive context, and expert coaching delivered in-game) and to the Aristotle factors that matter most in practice (psychological safety, structure and clarity, impact). One modern option among several.

Team Diagnostic Survey. Wageman, Hackman, and Lehman's validated instrument remains the most empirically grounded team diagnostic in general use. If you are standing up an internal OD practice and want one instrument you can defend, this is it. See how to measure team performance for a longer treatment of measurement approaches.

A pragmatic stance

Treat the ten frameworks as a library. For a routine team diagnosis, default to Hackman plus Edmondson's psychological-safety scale. For a specific change rollout, ADKAR or Kotter give you sequencing. For a reorg, 7S and Burke-Litwin surface what you might be ignoring. For a new team, Tuckman gives the manager a vocabulary.

The discipline is in knowing which framework you are using and what its evidence base actually is. When a consultant reaches for one with confidence, ask the question most OD presentations do not invite: what does the peer-reviewed literature say about this specific model? The answer is often "less than you think," and that is useful information before you write the check.

OD Frameworks for Team Performance

TL;DR

A manager calls a consultant. The consultant has a framework.

Whole-organization diagnostic frameworks

1. McKinsey 7S (1980)

2. Burke-Litwin Causal Model (1992)

Team-stage frameworks

3. Tuckman Stages (1965/1977)

4. Drexler-Sibbet Team Performance Model

Team-effectiveness diagnostic frameworks

5. Hackman's 5 Conditions (2002)

6. GRPI (Rubin, Plovnick & Fry, 1977)

7. Google's Project Aristotle (2012-2015)

Change-management frameworks

8. Lewin 3-Step Change (1947)

9. Kotter 8-Step (1995/1996)

10. ADKAR (Hiatt, 2006)

Mapping framework to problem

The honest evidence review

The "70% of change efforts fail" myth

What modern OD practice is adding

A pragmatic stance

Frequently Asked Questions

Ready to Level Up Your Team?

OD Frameworks for Team Performance

TL;DR

A manager calls a consultant. The consultant has a framework.

Whole-organization diagnostic frameworks

1. McKinsey 7S (1980)

2. Burke-Litwin Causal Model (1992)

Team-stage frameworks

3. Tuckman Stages (1965/1977)

4. Drexler-Sibbet Team Performance Model

Team-effectiveness diagnostic frameworks

5. Hackman's 5 Conditions (2002)

6. GRPI (Rubin, Plovnick & Fry, 1977)

7. Google's Project Aristotle (2012-2015)

Change-management frameworks

8. Lewin 3-Step Change (1947)

9. Kotter 8-Step (1995/1996)

10. ADKAR (Hiatt, 2006)

Mapping framework to problem

The honest evidence review

The "70% of change efforts fail" myth

What modern OD practice is adding

A pragmatic stance

Frequently Asked Questions

Keep Reading

Ready to Level Up Your Team?