Category 11 min read

From Engagement Surveys to Team Intelligence

A five-level maturity model for HR leaders who suspect the survey is no longer enough.

By QuestWorks Editorial

TL;DR

Most organizations measure team health through five levels of maturity: annual surveys, continuous pulses, behavioral telemetry, behavioral simulation, and Team Intelligence as a funded function. Most are stuck at Level 1 or 2 while spending real money on the assumption they have moved further. Gallup put global engagement at 20% in 2025, the lowest since 2020, with disengagement costing roughly $10 trillion in lost productivity. AI economics are now changing what each level costs to operate, which is why the migration is accelerating in 2026.

The state of engagement surveys in 2026

Engagement measurement has never been more saturated. Over 80% of Fortune 500 companies now use at least one digital engagement tool, according to compiled industry statistics from Cerkl. SHRM data summarized by SelectSoftwareReviews shows 71% of companies still run their engagement program on an annual cadence even as continuous-listening platforms sit unused next to them.

The output of that spend is not encouraging. Gallup's 2026 State of the Global Workplace report found that global employee engagement fell to 20% in 2025, the lowest level since 2020 and the first time global engagement has dropped for two consecutive years. Manager engagement has dropped nine points since 2022, from 31% to 22%. Gallup estimates the cost of low engagement at roughly $10 trillion in lost productivity, or about 9% of global GDP.

The mismatch between tooling and outcome is the defining HR signal of the year. The instinct from many HR leaders is to push for more frequent surveys, and the data does not support that instinct. Perceptyx, working with a benchmark dataset of more than 12 million responses, frames the issue cleanly: there is no such thing as survey fatigue, only inaction fatigue. Only 8% of organizations that surveyed without taking visible action saw an engagement increase over a three-year period. Among those that did act on results, 74% saw an increase. The lever is the response itself, since cadence alone produces no gain.

Compounding that, Mercer's recent performance management research notes only 2% of employers describe their performance management approach as delivering "outstanding value." Comprehensive annual surveys often run 50 to 100 questions and require 20 to 30 minutes to complete, with quality dropping as employees rush through. The pattern across the data is consistent: increased spending on the same instrument has not produced measurable gains in engagement, and a different class of instrument is needed.

Why surveys are lagging indicators

Surveys capture what people remember about how they felt, while behavioral data captures what they actually did, and the two often diverge. The Cambridge Handbook of Research Methods notes that self-reports are only weakly associated with actual behavior. That gap is the structural problem with using survey data as an operating signal.

Then there is the temporal problem. The classic distinction between leading and lagging indicators applies cleanly here. As Amplitude's product analytics primer describes, lagging indicators confirm what already happened, while leading indicators predict where things are going. Annual surveys are deeply lagging. Even quarterly pulses lag the events that produced them by weeks. Behavioral data, by contrast, can surface coordination breakdowns within days.

For a measurement system to support decisions, it has to capture signal close to when behavior occurs. The implication is structural. The mature stack treats sentiment surveys as the legal and benchmarking layer while routing operating decisions through faster instruments. Measuring employee engagement well in 2026 means combining sources across layers.

Adobe killed annual reviews. Then what?

The clearest analogue for what is happening now happened in 2012, when Adobe scrapped its annual performance review under Donna Morris, then Senior Vice President of People Resources. The replacement was the Check-in, a lightweight, frequent conversation between manager and report. WhatMatters reports the change saved roughly 80,000 manager hours per year, the equivalent of 40 full-time roles redirected to other work. A reduction of about 30% in voluntary attrition has been widely reported following the 2012 transition, though that figure shows up in secondary commentary rather than the Stanford GSB primary case.

Deloitte, GE, and Microsoft followed within a few years. Deloitte rebuilt its system around "performance snapshots": four questions answered by the team leader at the end of every project or quarter. SelectSoftwareReviews aggregates the outcome data for continuous performance management programs across these adopters: 24% outperformance versus peers, 39% better at attracting talent, 44% better at retention, 40% higher employee engagement, and 26% improvement in performance.

Adobe was the canary. The lesson extended beyond annual reviews to the entire instrument family of low-frequency, high-recall measurement, which was misaligned with how work actually unfolds. The same lesson is now arriving for engagement surveys.

The five-level Team Intelligence maturity model

The migration unfolds across five levels. Each level has a defining instrument, a representative vendor, and a class of decisions it can support. Skipping levels rarely works.

Level 1: Annual engagement survey

The defining instrument is a long-form questionnaire administered once a year, often anchored to Gallup's Q12 or a comparable instrument. Output is a benchmark score, typically reported up to the board. SHRM data still puts 71% of companies at this level. The decisions it can support are strategic and slow: compensation philosophy, exit interview themes, board reporting. It cannot support operational decisions about a specific team this quarter.

Level 2: Continuous engagement pulse

The defining instrument is a short, recurring survey, usually weekly, biweekly, or monthly. Lattice's published benchmark data on its own customer base offers the cleanest snapshot: 41% of customers run pulse surveys weekly, 35% biweekly, and 24% monthly, with an average participation rate of 74%. Culture Amp, 15Five, and OfficeVibe occupy adjacent positions. Pricing for this layer typically lands between $48 and $108 per employee per year: Culture Amp at roughly $37 per employee annually for the base, plus $5 to $9 per employee per month for the engagement module; Lattice from $8 per month at entry, $11 PEPM for performance and goals, up to $28 PEPM for the full stack; 15Five between $4 and $16 per employee per month.

Level 2 supports faster reaction to sentiment shifts. It is also where most organizations spend most of their measurement budget. Anonymous feedback tools sit in this layer and remain useful, but they share Level 1's underlying weakness. The signal is still self-reported.

Level 3: Behavioral telemetry and passive ONA

The defining instrument is metadata captured from existing collaboration tools, usually email, calendar, chat, and meetings. Microsoft Viva Insights sits inside the M365 footprint. Worklytics markets a privacy-by-design approach with 25-plus connectors and over 400 metrics. Internal teams analytics have improved dramatically: Microsoft Workplace Analytics has been associated with freeing approximately five hours per week per employee at Fortune 500 deployments.

Level 3 is also where privacy controversies live. In December 2020, Microsoft's Productivity Score was described by researcher Wolfie Christl as "a full-fledged workplace surveillance tool" tracking 73 metrics. Microsoft removed individual-level data within a week. The October 2025 announcement of Copilot productivity benchmarks and Teams location tracking reopened the debate. The methodological gain at Level 3 is real, and so is the governance work required to capture it without crossing surveillance lines.

Level 4: Behavioral simulation and dynamics data

The defining instrument is a structured simulation in which a team performs a high-fidelity task and the system observes how they coordinate. Surveys capture stated belief. Telemetry captures observable workflow. Simulation captures decision-making under realistic load.

The reference category is aviation Crew Resource Management. The term was coined in 1979 by NASA's John Lauber, in response to the December 1978 crash of United Flight 173, where the crew lost track of fuel state during a landing-gear malfunction. FAA history records that NASA convened a workshop on June 26 to 28, 1979, and United Airlines launched the first comprehensive CRM program in 1981. Studies cited in that document attribute 70% to 80% of aviation accidents to human error rather than equipment failure. Surgical simulation evolved on a similar arc; the AHRQ TeamSTEPPS evidence base documents how teamwork simulation became standard in operating rooms.

Capital has flowed into the individual-simulator category. CodeSignal raised $90.1 million; Strivr $86 million; Yoodli reached a $300 million valuation in December 2025 with $61.5 million raised; Attensi $32 to $57 million; Mursion $40.6 million. Combined funding into individual-skill simulators sits near $300 million. Almost none of that capital has flowed into team simulators. The category gap is the opportunity.

Level 5: Team Intelligence as a funded function

The defining instrument at Level 5 is organizational. Level 5 organizations have a named owner, a budget, and an operating cadence for Team Intelligence as a category. The closest precedent is People Analytics. LinkedIn's 2023 Jobs on the Rise placed Head of People Analytics among the fastest-growing roles in the United States.

No "Director of Team Intelligence" role exists publicly today. The forecast is that one will. Josh Bersin's November 2024 analysis observed that fewer than 10% of companies can correlate or directly link HR and people data to business metrics in a systemic way, despite billions spent on HCM platforms. Closing that gap is what a Team Intelligence function exists to do. Team Intelligence as a category and what Team Intelligence actually means are the working definitions.

What analogous categories show us

Three migrations from adjacent disciplines map neatly onto this one.

Customer Success grew from a reactive support function into a strategic discipline with its own software category. Gainsight's Customer Success Maturity Model describes the four-stage path from reactive intervention to mature, automated, weighted health scoring. The instrument that defines the discipline today is the customer health score, a unified red/yellow/green or 0 to 100 number assembled from NPS, support, usage, and engagement data. CRM never went away; the health score sat above it.

DevOps followed a parallel arc. Forsgren, Humble, and Kim's Accelerate, drawing on 23,000 data points, codified the four DORA metrics: deploy frequency, lead time, MTTR, and change failure rate. The metrics did not replace tickets or sprint reviews. They created a measurement layer above them that made DevOps a funded discipline.

Customer Experience went the same direction. Bruce Temkin spent 12 years at Forrester, founded the CXPA in 2011, and was acquired into Qualtrics in 2018. CXNetwork's profile describes the institutional buildout that turned CX from a gut-feel function into a board-level discipline.

Each of these took five to fifteen years. Each produced a new role, a new instrument, and a new operating rhythm. Each preserved the prior tooling and added a layer above it. There is no reason to believe Team Intelligence will follow a different pattern.

The five-year migration path

The migration works best when it is phased across five years, with each level building the artifacts the next level depends on.

Year one is foundation. Pick a Level 2 platform and run it well. Establish a pulse cadence. Build the muscle of acting on results, since Perceptyx's data shows that is where the engagement gain comes from. Most importantly, define the team unit. People analytics measures individuals; Team Intelligence requires that the team be a first-class object in the data model.

Year two is telemetry. Layer in a Level 3 capability under a clear governance posture. Worklytics, Viva Insights, or an equivalent. Limit the data to team-level aggregates. Publish the data model and the privacy policy openly, internally. The Microsoft Productivity Score history is a free education in how to do this badly; the surveillance reaction it triggered is reproducible.

Year three is simulation. Add a Level 4 capability, scoped initially to a high-stakes, high-coordination team population: incident response, sales pods, leadership groups. Aviation took two years from the 1979 NASA workshop to United's 1981 program. Adobe took roughly a year to design the 2012 Check-in. Build the artifacts, train the facilitators, and connect the output to existing performance rhythms.

Year four is integration. The dashboards from Levels 2, 3, and 4 begin to share a common data layer. The team becomes a measurable unit across the stack. Deloitte's 2025 Global Human Capital Trends report frames it directly: "In an era of human-centered work, new sources of data and artificial intelligence can help organizations shift from measuring employee productivity to measuring human performance."

Year five is the function. Hire or promote the role. Fund the budget. Set the operating cadence. Gartner's October 2025 CHRO survey found that 47% of 222 CHROs identified culture as a primary driver of performance. The financial argument is available now. Visier's published ROI work reports 137% higher return on assets, $125,000 higher revenue per employee, and a 7.5-month payback for organizations with mature people analytics. The Adobe precedent stays useful: 80,000 manager hours per year saved is the kind of operating gain that funds a function on its own.

Counter-arguments worth taking seriously

The first objection is compliance. Surveys are still required for legal cover and external benchmarking in many jurisdictions. The response is structural. Behavioral telemetry and simulation complement the survey layer; they do not replace it. The annual instrument remains the compliance moat. Operational signal moves to faster instruments above it.

The second objection is privacy. Behavioral telemetry has earned its bad reputation. The mitigation is privacy-by-design, which Worklytics has built its positioning around: GDPR-compliant ONA in 30 days, team-level aggregation by default, no individual-level views without explicit opt-in. The fact that the failure modes are public makes it easier to design around them.

The third objection is readiness. Many organizations argue the culture is not ready for Level 4 or 5. Aviation's CRM transition took two years from workshop to first program. Adobe's Check-in took roughly a year of design before launch. The phased path is the answer, and skipping levels reliably produces the failures that argument predicts. Edmondson and Kerrissey's May 2025 HBR piece on psychological safety misconceptions and their November 2025 longitudinal study with Bahadurzada, drawn from 27,240 healthcare workers tracked pre- and during COVID, both argue that the readiness question itself reflects a misreading of where the work happens. Readiness is built through the migration itself.

The fourth objection is sunk cost. Many HR teams have just bought Lattice, Culture Amp, or Visier and feel set. The response is that those platforms are Level 2 or, in Visier's case, Level 3-adjacent. Those platforms function as foundation. They remain the substrate the higher levels run on.

A practical note on where simulation fits

QuestWorks operates at Level 4. It is the Team Intelligence Engine, designed to capture behavioral dynamics that surveys and telemetry cannot reach. Teams of two to five run 25-minute sessions on QuestWorks' own cinematic, voice-controlled platform. The system observes how the team coordinates, where roles clarify, and how the group recovers after conflict. Outputs include a weekly team health report and team-level coordination signal: the layer above the survey, designed to feed the same dashboards the rest of the stack reports into. Slack handles install, invites, leaderboards, and HeroGPT coaching. The simulation runs on its own platform.

Nine HeroTypes are public, visible to teammates and managers. Coaching conversations with HeroGPT remain private. Participation is voluntary and opt-in. Quests are not tied to performance reviews. Pricing is $20 per user per month with a 14-day trial. The category positioning is straightforward: Team Intelligence, Powered by Play.

Frequently Asked Questions

There is no single replacement. The mature path is a layered stack: keep an annual or biannual survey for compliance and benchmarking, add behavioral telemetry from existing collaboration tools for leading indicators, and add behavioral simulation to observe how teams actually coordinate under pressure. Surveys measure stated sentiment. Telemetry measures observable behavior. Simulation measures decision-making in context. Each layer answers a different question.

Surveys capture self-reported sentiment after events have already shaped opinion. The Cambridge Handbook of Research Methods notes self-reports are only weakly associated with actual behavior. Annual cycles compound the lag: by the time results arrive, the conditions that produced them have shifted. Leading indicators come from observed behavior such as response latency, meeting load, decision speed, and recovery patterns after conflict.

People analytics studies individuals and aggregates them into population-level trends. Team Intelligence treats the team as the unit of analysis. The metrics are different: coordination quality, role clarity under load, recovery time after conflict, and decision velocity. The Team Intelligence Engine produces signal at the team level, which changes both the privacy posture and the operating leverage compared with individual-level analytics.

Skipping levels usually fails. Aviation needed two years from the 1979 NASA workshop to United Airlines' 1981 program. Adobe took a year to design the 2012 Check-in. Each level produces an artifact (a baseline, a dashboard, a ritual) that the next level depends on. A team trying to deploy simulation without any pulse rhythm or telemetry tends to lack the language to interpret what simulation reveals.

No. Level 2 platforms remain the system of record for sentiment, performance ratings, and goals. Level 4 and Level 5 capabilities sit alongside them and feed dashboards with behavioral signal that surveys cannot capture. The architectural pattern parallels Customer Success: CRM stayed in place while health scores, product telemetry, and CSM workflows added layers above it.

Ready to Level Up Your Team?

14-day free trial. Install in under a minute.

Slack icon Try it free
Team Intelligence, Powered by Play Try QuestWorks Free