Team Intelligence 10 min read

5 Metrics Every Team Intelligence Engine Tracks

Conflict Recovery Time, Trust Velocity, Decision Latency, Cross-Functional Throughput, Psychological Safety Stability. The research behind each, the vendor gaps, and the DORA precedent for getting team metrics right.

By Asa Goldstein, QuestWorks

TL;DR

Most team metrics are individual metrics rolled up. That is a measurement error Hackman and Wageman flagged decades ago. Five candidate metrics belong inside any Team Intelligence Engine: Conflict Recovery Time, Trust Velocity, Decision Latency, Cross-Functional Throughput, and Psychological Safety Stability. None are settled academic constructs. Each is grounded in adjacent peer-reviewed research and the lessons of DORA: behavioral signals where surveys used to live, measurable from existing telemetry, few in number, and team-level by default. Each definition has a research foundation, with the science marked solid where it is solid and provisional where the construct still needs validation.

Why Existing Team Metrics Fail

Walk into any people analytics review and the dashboard tells the same story: engagement scores, manager effectiveness ratings, eNPS, pulse responses. Each number averages individual self-reports rolled up to a team boundary. Richard Hackman and Ruth Wageman spent careers arguing this is a measurement error. Teams are interaction systems; averaging individuals erases the interaction. Wageman's Xerox study found team design conditions accounted for roughly 37 percent of variation in team performance, dwarfing individual talent or coaching style. The structural reality matters more than how any member feels on a given Tuesday. Yet the dashboards still report Tuesdays.

Engineering already has a cleaner playbook. DORA, codified in Accelerate by Forsgren, Humble, and Kim (2018), distilled four metrics from 23,000-plus respondents across 2,000-plus organizations: deployment frequency, lead time for changes, mean time to restore, and change failure rate. By 2019 it was the lingua franca of engineering performance. Why? Few metrics. Team-level by default. Behavioral. Measurable from telemetry already in the commit log.

Team Intelligence needs the same kind of foundation. Five candidate metrics make an opening proposal: Conflict Recovery Time, Trust Velocity, Decision Latency, Cross-Functional Throughput, and Psychological Safety Stability. Each is built on peer-reviewed adjacent research, each names a real performance signal that current vendors miss, and each carries an explicit note about where the construct still needs validation.

Metric 1: Conflict Recovery Time

Definition: the elapsed time from a conflict event until the team returns to baseline collaboration. Measured in hours and days against behavioral baselines. The signal is whether a team metabolizes friction or whether friction lingers in the work itself: missed standups, dropped handoffs, slower review cycles, rerouting around a teammate.

Research foundation. The canonical CPP estimate from the 2008 CPP Global Human Capital Report put US conflict at 2.8 hours per employee per week, roughly $359 billion in lost paid hours annually. The more recent SHRM Civility Index Q4 2024 reports about 208 million acts of incivility per day in US workplaces at a cost near $2.1 billion per day.

The operative concept is recovery. Amy Edmondson's 2011 HBR essay “Strategies for Learning from Failure” separates preventable, complex, and intelligent failure: “Failure is not always bad. In organizational life it is sometimes bad, sometimes inevitable, and sometimes even good.” Lencioni's Five Dysfunctions pyramid places fear of conflict directly above absence of trust. The Gottman Institute's couples research points to a healthy 5:1 positive-to-negative ratio during conflict; the team-extension Losada 3:1 figure was partially retracted in 2013 (Brown, Sokal, Friedman), so it should be cited with care. Amazon's “disagree and commit,” from the Bezos 2016 letter, treats unresolved disagreement as a feature.

Why it matters and where vendors fall short. Teams that recover quickly compound learning. Teams that don't convert one bad meeting into three weeks of avoidance. Behavioral proxies carry more signal than self-report (response latency to a teammate after a tense meeting, time between the event and the next collaborative artifact ship), and recovery has to be defined relative to that team's baseline. Lattice, Culture Amp, and 15Five capture engagement and 1:1 quality. None directly measure how long a team stays off-rhythm after friction. White space.

Metric 2: Trust Velocity

Definition: the rate at which trust forms and decays inside a team. The metric watches the slope. A team can score high on a trust survey and still be brittle, because trust decays faster than it forms when stress arrives.

Research foundation. The closest formal construct is “swift trust” from Meyerson, Weick, and Kramer (1996), revisited in Barrett (2025). The foundational organizational trust model from Mayer, Davis, and Schoorman (1995, AMR 20(3): 709-734) decomposes trust into Ability, Benevolence, and Integrity. Frances Frei's 2018 TED talk reframes that into Authenticity, Logic, and Empathy: “People tend to trust you when they think they are interacting with the real you, when they have faith in your judgment and competence, and when they believe that you care about them.”

Brené Brown's BRAVING framework is a useful practitioner heuristic, though it is not peer-reviewed. Two empirical anchors give the velocity framing teeth. The Great Place to Work Trust Index distills 12 million-plus employee responses across 90-plus countries into five dimensions, with annual cadence that captures level but misses velocity. BambooHR research finds 86 percent of new hires decide how long they will stay within the first six months, which is a velocity question. And Jarvenpaa, Knoll, and Leidner (1998) documented that trust forms more slowly and is more fragile in remote and virtual teams.

Why it matters and where vendors fall short. Most teams diagnose a trust level when the underlying issue is trust velocity. They form trust in onboarding, lose it the first time someone gets blamed in the room, and never recover the rate. Pulse-survey velocity is the obvious behavioral proxy; vulnerability indicators in transcripts (asks for help, admissions of not knowing, willingness to volunteer mistakes in retros) sharpen it. Great Place to Work measures the level once a year. Culture Amp, Lattice, and 15Five run pulse surveys but rarely model rate of change as the primary signal. Worklytics and similar communication-pattern tools treat trust implicitly through reciprocity, without naming velocity as a metric.

Metric 3: Decision Latency

Definition: the elapsed time from problem identification to decision execution. The gap between “we see this” and “we are doing something about it.” Engineering teams already track an analogue under DORA's lead time for changes. The cross-functional, non-engineering version is largely unmeasured.

Research foundation. McKinsey's April 2019 piece “Decision Making in the Age of Urgency” reports that speed and quality of decision-making are both strongly associated with overall company performance, and that they correlate. Fast decisions are not lower quality. Top-performing organizations are about twice as likely to report 20-percent-plus financial returns. Managers spend roughly 37 percent of their time deciding; 58 percent of that time is wasted. Bain's RAPID framework structures decision rights. The Bezos 2015 Amazon letter introduces the two-way door distinction: irreversible, one-way decisions must be made methodically; reversible, two-way decisions can be made quickly by small groups. The 2016 letter adds the 70 percent rule: “Most decisions should probably be made with somewhere around 70 percent of the information you wish you had. If you wait for 90 percent, in most cases, you're probably being slow.”

Brooks's Law from The Mythical Man-Month (1975) is the structural reminder: communication channels grow as n(n-1)/2, so a team of ten has 45 channels. Decision latency is partially a function of channel count, which is why the DORA precedent matters. DORA measured lead time for changes from existing commit telemetry instead of surveys.

Why it matters and where vendors fall short. Most companies lose to faster execution on a similar strategy. Useful proxies for non-engineering teams: time from issue creation to status change, time from a question raised in a meeting to a recorded answer, time from RAPID role assignment to decision logged. The hardest part is distinguishing one-way doors (where slowness is correct) from two-way doors (where slowness is waste). Engineering platforms like LinearB, Jellyfish, and GetDX track DORA metrics inside the IDE-to-deploy pipeline. Asana, Monday, and Linear track ticket cycle time as a task-level proxy. No mainstream vendor markets a generic decision latency metric for non-engineering teams.

Metric 4: Cross-Functional Throughput

Definition: the volume and quality of work that successfully ships across team boundaries. Single-squad velocity misses the load-bearing question: how much real output crosses the seams between Product and Engineering, Sales and Customer Success, Design and Marketing. The academic literature labels the surrounding territory as “team interdependence” (Hackman, Wageman) and “multiteaming” (Mortensen, Gardner).

Research foundation. Mortensen and Gardner's 2017 HBR piece “The Overcommitted Organization” reports that “at least 81% of more than 500 managers in global companies” said multiteaming is a way of life, with people involved in 6 to 15 projects in a typical week. Gallup finds 84 percent of employees are at least slightly matrixed.

The structural rules are old. Conway's Law (1967) argues that organizational communication structure determines system architecture. Spotify's 2012 squad/tribe paper made cross-functional ownership a default pattern. ING's agile transformation, documented by McKinsey in 2017, reorganized into roughly 350 nine-person squads inside 13 tribes; ING Business Platform NPS moved from -30 to +30 in a single year. Deloitte's ONA case on a sales redesign reports a 12 percent revenue gain after a team-centric model. Rob Cross's Connected Commons work finds “the most beleaguered people carry a much larger share of the collaboration burden,” meaning throughput concentrates in a small number of overloaded boundary spanners.

Why it matters and where vendors fall short. Most strategic work is cross-functional; most measurement is single-team. The unit of analysis has to be the boundary itself. Useful approaches: ONA to identify boundary spanners and their load, joint cycle time on multi-function artifacts (a launch, an enterprise deal, a hiring loop), and quality measures like rework rate or post-launch incidents tied to handoff failure. Worklytics, Microsoft Viva Insights, Polinode, and Innovisor offer ONA snapshots. Atlassian's Jira tracks task-level cycle time inside one project. No mainstream vendor reports cross-functional throughput as a continuous, named metric across the company.

Metric 5: Psychological Safety Stability

Definition: whether psychological safety holds steady when the team is under pressure. Calm-conditions presence is table stakes; the 2025 emphasis from Edmondson and her co-authors is on what survives stress. Stability is the load-bearing variable.

Research foundation. The construct was formalized by Edmondson (1999) in ASQ. Frazier and colleagues (2017) meta-analyzed 136 samples and more than 22,000 individuals, confirming the construct's relationship to learning and performance. The 2025 literature shifts the question from presence to stability. Edmondson and Kerrissey's May 2025 HBR piece argues structurally: “Telling people in a company or on a team that they must have psychological safety, or else, will not produce it.” Edmondson, Bahadurzada, and Kerrissey's November 2025 follow-up tracked 27,000-plus workers in a large hospital system across May 2019 and May 2021. Pre-pandemic psychological safety predicted post-pandemic willingness to stay. The mechanism is stability under stress.

Google's Project Aristotle placed psychological safety first among five team dynamics; the often-cited 27 percent performance gain figure is widely repeated, but the primary Google source is hard to verify, so it is best treated as suggestive. IHHP's October 2025 critique sharpens the point: “A supportive situation does not automatically guarantee a courageous individual.”

Why it matters and where vendors fall short. A safe-on-Monday, brittle-on-Wednesday team is a liability. Stability is the difference between a team that surfaces a bad number in time to fix it and one that surfaces it after the board meeting. Single-point surveys cannot detect stability; the frame requires repeated measurement across calm and pressured weeks, combined with behavioral signals like who speaks up in tough meetings and whether bad news travels up. The Fearless Organization Scan is a useful starting instrument. Culture Amp, Lattice, and 15Five include psychological safety items in their pulses but do not model variance across pressure conditions.

Why DORA Worked, And What That Teaches Us

Four lessons translate from DORA to Team Intelligence.

Behavioral signals first. Lead time for changes is read from the commit log. The team is never asked how fast it deploys; the data answers for them. Team Intelligence metrics need behavioral analogues wherever possible; self-report belongs as a secondary signal.

Measurable from existing telemetry. DORA required no new tool, just a query against tools already in place. The strongest version of each metric here uses calendars, transcripts, ticket logs, repository history, and ONA edges that already exist.

Empirical legitimacy from scale. 23,000-plus respondents across 2,000-plus organizations gave DORA the right to claim a category. Team Intelligence will need its own scaled benchmark to graduate from coining to standard.

Few in number, team-level by default. Four metrics covering the whole field. The unit of analysis is the team. That discipline is what made DORA usable inside an executive review. Five metrics for Team Intelligence is a deliberate cap.

The QuestWorks Approach

QuestWorks is building toward exactly that standard. The Team Intelligence Engine runs on its own cinematic, voice-controlled platform; Slack is the integration layer for install, invites, leaderboards, and HeroGPT, the private AI coach that never shares conversations upstream. Teams of two to five drop into 25-minute quests on QuestWorks' own web platform and play through scenarios that surface real interaction patterns.

The instrumentation maps to the five metrics. Conflict moments inside a quest carry timestamps; recovery is observable on the same arc. Decision sequences produce latency curves. Cross-functional behavior shows up in joint quest performance and QuestDash leaderboard signals visible to everyone. The Weekly Team Health Report, separate from QuestDash and visible to leaders only, surfaces stability patterns over time. Participation is voluntary and is not tied to performance reviews. Leaders see aggregate trends and strengths-based highlights; HeroGPT coaching and individual gameplay detail stay private.

You cannot measure what you cannot observe in motion, and you cannot observe team dynamics in motion through a survey. Team Intelligence, Powered by Play is the operating thesis. The Team Intelligence Engine is the instrument. $20 per user per month, 14-day free trial.

Further reading: how to measure team dynamics, how to measure team performance, the leadership skills that predict performance, and the framing of Team Intelligence as a new category.

Frequently Asked Questions

No. None of the five names (Conflict Recovery Time, Trust Velocity, Decision Latency, Cross-Functional Throughput, Psychological Safety Stability) are settled peer-reviewed constructs. Each is grounded in adjacent research: Edmondson on psychological safety, Mayer/Davis/Schoorman and Frei on trust, Mortensen and Gardner on multiteaming, Lencioni and CPP/SHRM data on conflict, McKinsey and Bezos on decision speed. These are working definitions that point toward what a Team Intelligence Engine should track, with explicit notes on where the science is solid and where the construct still needs validation.

DORA is engineering-only and reads from the commit log. OKRs measure outcomes while leaving the dynamics that produce them unmeasured. Engagement surveys roll up individual self-report to a team boundary, which Hackman and Wageman flagged as a measurement error decades ago. The five Team Intelligence metrics here are team-level by default, lean on behavioral telemetry where possible, and target dynamics that current tools miss: how fast a team recovers from conflict, whether trust is forming or decaying, how long decisions take, how much work crosses team boundaries, and whether psychological safety is stable under pressure.

DORA, codified in Accelerate (Forsgren, Humble, Kim, 2018) from 23,000-plus respondents across 2,000-plus organizations, distilled engineering performance into four metrics: deployment frequency, lead time for changes, mean time to restore, and change failure rate. Four lessons transfer: behavioral signals where surveys used to live, measurable from existing telemetry, empirical legitimacy from scale, and few in number with team-level as the default unit of analysis. Team Intelligence metrics need to follow the same pattern to graduate from coined idea to industry standard.

QuestWorks runs on its own cinematic, voice-controlled platform, with Slack as the integration layer. Teams of two to five play 25-minute quests; the platform timestamps conflict moments, decision sequences, cross-functional handoffs, and speak-up behavior. The Team Intelligence Engine surfaces aggregate team trends and strengths-based highlights to leaders through QuestDash and a separate Weekly Team Health Report. HeroGPT coaching is private and never shared upstream. Participation is voluntary and is not tied to performance reviews.

Peer-reviewed sources include Edmondson (1999, ASQ) and the Frazier et al. (2017) meta-analysis on psychological safety, Mayer/Davis/Schoorman (1995, AMR) on organizational trust, Meyerson/Weick/Kramer (1996) on swift trust, and Jarvenpaa/Knoll/Leidner (1998) on virtual team trust. Practitioner research from McKinsey (decision speed, 2019), Mortensen and Gardner (HBR 2017), Gallup (matrix research), and Rob Cross (ONA) is well-sourced but applied work. The 2008 CPP conflict figure is dated and labeled as such; the Losada 3:1 ratio is partially retracted (Brown, Sokal, Friedman 2013) and is not used as evidence; Project Aristotle's 27 percent performance gain figure is hedged because the primary source is hard to verify.

Ready to Level Up Your Team?

14-day free trial. Install in under a minute.

Slack icon Try it free
Team Intelligence, Powered by Play Try QuestWorks Free