Your people completed the training. Do you know if it worked?
Most enterprises measure AI readiness through completion rates, course hours, and certificates. None of these reliably predict whether employees can actually use AI in their work. The distinction between tracking training activity and measuring professional capability is the most consequential gap in enterprise AI adoption today.
Completion rates tell you who finished a course. They tell you nothing about who can use AI.
Enterprises have invested heavily in AI training over the past three years. The programmes are well-intentioned. The completion rates look healthy. The certificates are prominently displayed. And the underlying assumption — that completing training produces competence — goes largely unexamined.
The research does not support that assumption.
The BCG-Harvard study of 758 consultants, published in Organization Science in March 2026, found that AI training alone showed no statistically significant performance advantage over simple tool access. The consultants who received training and those who simply received the tool performed indistinguishably. The variance in outcomes was driven by individual proficiency — a characteristic that training completion cannot detect.
Section AI's 2026 Proficiency Report, which combined surveys with hands-on skill testing across 5,000 knowledge workers, confirmed the pattern at scale: 97% of the workforce are using AI poorly or not at all. Among them, employees who had completed AI training programmes still scored only 40 out of 100 on proficiency assessments. They remained firmly in the “experimenter” category — capable of basic prompting but unable to reliably evaluate AI output, identify when AI use is inappropriate, or manage the risks of AI-generated content in professional settings.
What completion tracks, what it measures — and what it misses
Training completion metrics record that an employee watched the videos, clicked through the slides, and passed a recall-based quiz. They can tell you who engaged with the content and who did not. This information is not worthless — it reveals who showed up.
What completion metrics cannot tell you is whether the employee can now formulate a prompt that produces usable output on the first attempt, identify a selectively accurate statement in an AI-generated analysis, determine when a task falls outside AI's capability boundary, or navigate the disclosure and privacy obligations of using AI with client data. These are the capabilities that determine whether AI use creates value or creates risk in professional work. And they require a different kind of measurement entirely.
The distinction is not academic. Classic transfer-of-training research estimates that only 10–15% of training effectively transfers to workplace application (Georgenson 1982, Ford et al. 2018). The Association for Talent Development found only 12% of employees effectively apply new skills on the job. These transfer rates were established before AI — where the distance between watching a tutorial and applying judgment under uncertainty is particularly wide.
The gap between completion and proficiency has a price — and someone is paying it
Consider the 4% figure. In 96% of organisations, the executive team cannot see what L&D delivers. When the board asks “how AI-ready is our workforce?” and the only available answer is a completion percentage, the L&D function is evaluated on the wrong metric — and the investment is protected by faith rather than evidence.
Meanwhile, the cost of unmeasured AI use accumulates. BetterUp Labs and Stanford's Social Media Lab found that 40% of employees receive AI-generated “workslop” each month — low-quality content that requires an average of 1 hour and 56 minutes to resolve per incident. For a 1,000-person organisation, that translates to over $2.2 million per year in invisible rework. The completion rate of the training programme that was supposed to prevent this looks fine. The spreadsheet doesn't show what completion failed to produce.
What changes when you measure proficiency instead of completion
| Dimension | Completion Tracking | Proficiency Measurement |
|---|---|---|
| What it measures | Who finished the course | Who can apply AI effectively in their role |
| Underlying science | Classical Test Theory at most — percentage-correct scoring | Item Response Theory — the methodology behind major standardised assessments worldwide |
| Question difficulty | All items treated equally | Each item calibrated for difficulty and discrimination — harder questions tell you more |
| Score precision | Percentage correct — no error estimate | Confidence interval on every score — you know how precise the estimate is |
| Comparability | Scores depend on which quiz version was taken | Scores comparable across different forms — because ability is estimated independently of specific items |
| What it detects | Who engaged with content | Who can evaluate AI output, identify errors, exercise judgment, and manage risk |
| Gaming resistance | Low — fixed questions, no adaptation | High — unique forms, adaptive difficulty, response pattern analysis, timing monitoring |
| Growth measurement | Repeated completion measures re-engagement, not growth | Pre/post designed on the same psychometric scale — growth reported only when it exceeds measurement error |
| Board presentation | “87% completed the programme” | “Advisory is at Competent level. Tax is Developing. Here's where to invest next quarter.” |
| The analogy | Counting gym memberships | Measuring fitness levels |
The people with the largest gaps are the least accurate at identifying them
The instinctive response to the completion gap is often “we'll survey employees on their AI confidence.” The evidence shows why this produces worse data, not better.
Aalto University researchers published findings in Computers in Human Behavior (February 2026) that upend the assumption behind self-assessment. In two studies with approximately 500 participants, they found a reverse Dunning-Kruger effect: higher AI literacy correlated with greater overconfidence, not better self-calibration. Participants using ChatGPT overestimated their correct answers by 4 points out of 20 — a gap larger than the actual performance improvement from using AI. Financial incentives for accurate self-assessment did not correct the bias.
The industry data confirms it at scale. 79% of tech workers admit to pretending they know more about AI than they do (Pluralsight 2025). 81% profess confidence in their AI skills, but only 12% have significant hands-on experience. And 64% of workers pass off AI-generated content as their own (Salesforce 2024) — a behaviour that self-assessment by definition will not reveal.
Kruger and Dunning's foundational 1999 research found that bottom-quartile performers rate themselves at the 58th–62nd percentile on average. The gap is structural, not motivational — people lack the very skills needed to recognise their own deficiency. In the context of AI, where outputs appear fluent and authoritative regardless of their accuracy, this metacognitive blind spot is particularly dangerous.
Completion tracking misses the problem. Self-assessment misrepresents it. Standard LMS quizzes — built on Classical Test Theory where all items count equally — lack the precision to detect it. The difference between an LMS quiz and psychometric proficiency measurement is the difference between a pop quiz and a medical board exam: one checks recall, the other measures whether you can practise. Performance-based psychometric assessment measures the capability that matters — with known precision, calibrated difficulty, and scores that mean the same thing regardless of which questions were asked.
Three findings that close the argument
The BCG-Harvard study (Dell'Acqua et al., Organization Science, March 2026) enrolled 758 consultants in a pre-registered randomised controlled trial. AI-proficient workers produced 40% higher-quality output on suitable tasks. Workers who misjudged AI's capability boundary performed 19 percentage points worse than colleagues using no AI at all. And approximately 10% — the “Sleeping Drivers” — passively delegated to AI without exercising judgment, producing the worst outcomes of any group. Training completion could not distinguish proficient users from Sleeping Drivers. Proficiency measurement can.
Gartner's 2025 Strategic Predictions forecast that 75% of hiring processes will include AI proficiency certifications and testing by 2027 — while simultaneously predicting that 50% of organisations will require “AI-free” skills assessments to counter critical-thinking atrophy from generative AI use. Both predictions point in the same direction: the era of treating AI readiness as a training checkbox is ending.
A 2024 Nature systematic review (npj Science of Learning) evaluated 16 AI literacy measurement scales across 22 studies and concluded that no psychometrically validated gold standard for measuring AI literacy exists. Most scales demonstrated adequate structural validity, but very few had been tested for cross-cultural validity, measurement error, or criterion validity. The gap between AI adoption and validated proficiency measurement is documented at the highest levels of academic research.
From tracking activity to measuring capability
AI proficiency measurement applies the same psychometric science that has been trusted for 60 years in the highest-stakes assessments — from graduate admissions to medical licensing to military selection — to the specific question of how effectively professionals use AI in their work.
The approach differs from completion tracking in three fundamental ways. First, it accounts for question difficulty. Answering a hard question correctly reveals more about proficiency than answering an easy one — a principle that percentage-correct scoring ignores entirely. Second, it produces comparable scores across different test forms. Because ability is estimated independently of the specific items asked, two employees who take different versions of the assessment receive scores on the same scale. Third, every score includes a confidence interval — an explicit estimate of how precise the measurement is, preventing managers from over-interpreting small differences that may be noise.
The result is organisational visibility that completion metrics cannot provide: which teams can deploy AI effectively today, which need targeted development, where the risks of unsupervised AI use are highest, and whether training investments are producing measurable change over time. The board presentation shifts from “87% completed the programme” to “Advisory is at Competent level, Tax is Developing, and here is where the next quarter's investment should go.”
Completion, proficiency, and measurement
References
BetterUp Labs & Stanford Social Media Lab. (2025). The hidden toll of AI-generated work. Harvard Business Review, September 2025.
Dell'Acqua, F., McFowland, E., Mollick, E.R., et al. (2026). Navigating the jagged technological frontier. Organization Science. DOI: 10.1287/orsc.2025.21838.
Fernandes, D., Welsch, R., et al. (2026). The effects of AI on metacognitive accuracy. Computers in Human Behavior.
Ford, J.K., Baldwin, T.T., & Prasad, J. (2018). Transfer of training: The known and the unknown. Annual Review of Organizational Psychology and Organizational Behavior, 5, 201–225.
Gartner. (2025). Top Strategic Predictions for 2026 and Beyond. Gartner Research.
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it. Journal of Personality and Social Psychology, 77(6), 1121–1134.
Nature. (2024). Systematic review of AI literacy measurement instruments. npj Science of Learning.
Pluralsight. (2025). 2025 AI Skills Report: Mind the Confidence Gap.
Salesforce. (2024). Trends in AI for CRM. Salesforce Research.
Section AI. (2026). 2026 AI Proficiency Report. Section.
See what proficiency measurement looks like in practice
The methodology page explains how psychometric proficiency measurement works in practice — and how it differs from every other approach to assessing AI readiness. The research page presents the full evidence base.