The Research

The evidence for measuring AI proficiency — not just training completion

Enterprises are spending trillions on AI, but independent research from Harvard, Stanford, MIT, and the World Economic Forum converges on the same finding: most professionals cannot use these tools effectively — and training alone does not close the gap. Measurement is the missing layer.

All sources peer-reviewed or institutional24 independent sources citedUpdated March 2026
The Headline Numbers

Three findings that reframe every AI investment conversation

97%
of knowledge workers are at novice-level AI proficiency
Section AI Proficiency Report, Jan 2026 — survey of 5,000 + hands-on testing
$2,232
annual cost per employee of AI-generated rework
BetterUp Labs & Stanford Social Media Lab, HBR Sep 2025
40/100
average proficiency score of employees who completed AI training programmes
Section AI Proficiency Report, Jan 2026 — hands-on testing, not self-reports

AI proficiency is defined as the ability to use generative AI tools effectively, responsibly, and with appropriate judgment in professional knowledge work. It encompasses the capacity to formulate effective instructions, evaluate and refine AI output, exercise critical judgment about AI-generated content, determine when AI use is and is not appropriate, and manage ethical, privacy, and disclosure obligations. Research across multiple domains suggests these five dimensions are distinct and independently measurable.

The Spending Paradox

Enterprises spend 2–6 times more on AI tools than on AI skills

The imbalance is structural, not accidental — and it produces measurable consequences.

Microsoft 365 Copilot costs $30 per user per month. ChatGPT Enterprise runs approximately $60. GitHub Copilot Business adds $19. For a typical knowledge worker, the annual AI tool cost falls between $360 and $888, scaling to $3,000 or more for multi-tool deployments. These are published vendor list prices confirmed through official pricing pages as of early 2026.

The training side tells a different story. Training Magazine's 2025 Training Industry Report found average total training spend of $874 per employee across all training categories. The AI-specific portion is estimated at $150–$500 per employee, based on typical allocation patterns. This creates a 2x to 6x imbalance between tool spending and skill-building spending.

McKinsey's analysis of World Economic Forum Lighthouse Network sites confirms the pattern from the other direction: top-performing organisations spend $5 on adoption and capability building for every $2 on technology — a ratio that most enterprises invert.

Worldwide AI spending is projected to reach $2.52 trillion in 2026. The question facing every enterprise is not whether to invest in AI — but whether that investment is producing measurable human capability.

The total AI market tells the story at macro scale. Gartner projects $2.52 trillion in worldwide AI spending in 2026, up 44% from 2025. Big Tech capital expenditure alone — Amazon, Alphabet, Meta, and Microsoft combined — is projected at $562–650 billion in 2026, roughly 2.5 times the 2024 total. Amazon CEO Andy Jassy confirmed the $200 billion 2026 target during February 2026 earnings commentary.

The spending is not in question. The capability is.

The Skills Crisis

Independent research programmes converge on the same finding

Ten independent research programmes — spanning more than 70,000 respondents across dozens of countries — tell the same story.

FindingDetailSource
97% are AI novices28% Novices, 69% Experimenters (basic prompting only), 2.7% Practitioners, 0.08% Experts. Employees who completed AI training still scored only 40 out of 100 on proficiency tests.Section AI, Jan 2026 — 5,000 workers, hands-on testing
63% cite skills gaps as #1 barrierEmployers identify skills gaps as the single biggest barrier to business transformation. 40% of job-required skills expected to change by 2030.WEF Future of Jobs 2025 — 1,000+ cos, 14M workers
46% of CxOs cite talent gapsCited as primary reason AI development is too slow — ahead of technology, data quality, and budget constraints.McKinsey State of AI, Oct–Nov 2024 — 3,851 respondents
89% say upskilling needed, only 6% startedLeaders acknowledge the need. Fewer than one in fifteen have begun addressing it at scale.BCG AI at Work 2025 — 10,600+ respondents, 11 countries
75% use AI, only 39% received trainingThree-quarters of knowledge workers use generative AI at work. Only two-fifths received any company-provided training.Microsoft/LinkedIn Work Trend Index, 2024 — 31,000 workers, 31 countries
88% use AI, only 5% advancedJust 12% received sufficient training. The vast majority operate at surface level.EY Work Reimagined 2025 — 15,000 employees, 29 countries
Only 26% offer formal AI upskillingThe proportion of organisations offering formal AI upskilling programmes has declined from 35% the prior year — even as AI adoption accelerates.LinkedIn 2026 Workplace Learning Report
Only 1% consider themselves AI-matureNearly 80% of companies use generative AI, but just 1% of leaders consider their organisations “mature” in AI deployment. Over 60% report no significant bottom-line impact.McKinsey “Superagency” Report, Jan 2025 — 3,613 employees, 238 C-suite
79% pretend to know moreTech workers admit to faking their AI knowledge. 97% of leaders have exaggerated at least once.Pluralsight AI Skills Report, Apr 2025 — 1,200 decision-makers
75% of hiring will test proficiency by 2027Gartner predicts three-quarters of hiring processes will include AI proficiency testing — while simultaneously predicting that 50% of organisations will require “AI-free” skills assessments to counter critical-thinking atrophy from generative AI use.Gartner Strategic Predictions, Oct 2025

The most striking finding across all surveys: organisations that invest in training cannot currently verify whether it worked. Course completion does not predict proficiency. Self-reported confidence inversely correlates with accuracy for many users. Only 4% of learning leaders can communicate tangible business outcomes of their programmes, and 92% of business leaders fail to see the impact of learning initiatives. The enterprise AI skills gap — the distance between tool adoption and effective tool use — represents one of the largest unpriced risks in enterprise technology today. IDC projects that 90% or more of global enterprises will face critical AI skills shortages by 2026, risking $5.5 trillion in losses from degraded global market performance.

The Cost of Unmeasured AI Use

$2,232 per employee per year in AI-generated rework

Published research has now quantified the cost — and it compounds every month it goes unaddressed.

$186
per employee per month in “workslop” rework costs
BetterUp Labs and Stanford's Social Media Lab surveyed 1,150 full-time U.S. desk workers and found that 40% had received AI-generated “workslop” — low-quality, AI-produced content passed along without adequate human oversight — in the preceding month. Each incident required an average of 1 hour and 56 minutes to resolve. For a 10,000-person organisation, the annual cost exceeds $9 million. Published in Harvard Business Review, September 2025.

The reputational damage compounds the financial cost. The same study found that 42% of respondents viewed workslop senders as less trustworthy, 54% as less creative, and 33% were less likely to want to work with that person again.

Additional cost evidence converges from multiple directions. Forrester's Enterprise AI Cost Analysis attributes $14,200 per employee per year to hallucination mitigation — 4.3 hours per week spent verifying AI outputs. Enterprise software utilisation data indicates 30% of AI tool licenses sit completely idle, with automated harvesting reclaiming $1.2 million annually per enterprise. And MIT's 2025 “GenAI Divide” report placed the AI pilot failure rate at 95% — with a Fortune 500 financial services case study documenting $50 million spent on AI tools, 80% of employees reverting to spreadsheets within three months.

The Landmark Study

The BCG-Harvard “jagged frontier” study — now peer-reviewed

Navigating the Jagged Technological Frontier
Peer-reviewed 2026
Dell'Acqua, McFowland, Mollick et al. | Organization Science, March 2026 | DOI: 10.1287/orsc.2025.21838 | 758 BCG consultants | Pre-registered RCT (IRB23-0392) | 237+ academic citations

This is the single most important empirical study of AI in the workplace. It enrolled 758 BCG consultants — approximately 7% of BCG's individual-contributor consultant workforce — and randomised them across three conditions: no AI access, GPT-4 access, and GPT-4 with prompt engineering guidance.

40%+
higher-quality work when AI was applied to tasks inside the frontier. Below-average performers improved by 43%; above-average by 17%.
−19pp
drop in correct solutions when AI was applied to tasks outside the frontier. Accuracy fell from 84.5% to approximately 65.5%.
No gain
from training alone. AI training without proficiency measurement showed no significant performance advantage over simple tool access.
~10%
Sleeping Drivers” — passive over-reliers who produced worse results than colleagues using no AI at all.

The jagged technological frontier describes the uneven boundary of AI capability, where tasks that appear similar in difficulty may fall on opposite sides of what AI can competently perform. The researchers defined it as “systematic differences in how well AI capabilities align with different tasks compared with expectations based on human capabilities.” Workers cannot reliably self-assess which side of the frontier a given task falls on — and when they guess wrong, outcomes are measurably worse than not using AI at all.

Three integration archetypes emerged. Centaurs (~30%) maintained clear human-AI task separation, strategically dividing work. Cyborgs (~60%) deeply fused AI across their workflow. Sleeping Drivers (~10%) passively delegated to AI without exercising judgment — and produced the worst outcomes of any group, including the no-AI control.

A follow-up study, “GenAI as a Power Persuader” (HBS Working Paper 26-021), analysed GPT-4 activity logs and discovered that when professionals pushed back on AI errors, the AI escalated its persuasion using all three Aristotelian modes — ethos, logos, and pathos. Rather than disclosing limitations, the AI apologised, appeared to correct, then restated its original wrong position with more structured reasoning. The researchers term this “persuasion bombing.”

Corroborating evidence is building. Noy and Zhang published in Science (July 2023, 956 citations) showing ChatGPT decreased task completion time by 40% and raised quality by 18%, while compressing the productivity distribution — meaning AI is a skill leveller that benefits lower performers most. A further field experiment with 776 Procter & Gamble professionals (HBS Working Paper 25-043) has extended the BCG research to team dynamics. The consistent finding across all studies: the variance in outcomes is driven by individual proficiency, and that proficiency is measurable.

The implication is direct: training without measurement creates demonstrable organisational risk. The Sleeping Driver archetype — passive AI over-reliance — is invisible without proficiency assessment.
The Overconfidence Problem

Self-assessments fail because AI reverses the Dunning-Kruger effect

The people with the largest skill gaps are the least accurate at identifying them.

Aalto University researchers published findings in Computers in Human Behavior (February 2026) based on two studies with approximately 500 participants using LSAT logical reasoning tasks. Participants using ChatGPT estimated they answered roughly 17 out of 20 questions correctly; their actual score was approximately 13 — a 4-point overestimation gap that exceeded the 3-point performance improvement from using AI.

The classic Dunning-Kruger pattern vanished entirely with AI use, replaced by a reverse effect: higher AI literacy correlated with greater overconfidence, not less. Financial incentives for accurate self-assessment did not correct the bias. This finding directly validates the need for performance-based assessment — such as IRT-based psychometric measurement — over self-report instruments.

79%
of tech workers pretend to know more about AI than they do
Pluralsight, Apr 2025 — 1,200 decision-makers
64%
of workers pass off AI-generated content as their own
Salesforce, 2024 — 6,000 workers, 9 countries

Automation bias — the tendency to defer to automated systems even when they are wrong — is well-established in the research literature. Parasuraman and Manzey's foundational 2010 review in Human Factors established that it occurs in both naive and expert users and cannot be fully prevented by training or instructions. A 2025 systematic review in AI & Society found that explainability mechanisms designed to mitigate automation bias may paradoxically reinforce misplaced trust, particularly among less experienced professionals.

The practical implication: asking employees how well they use AI produces unreliable data. The people most in need of development are the least able to identify that need. Measurement — objective, performance-based, psychometrically validated — is the only defensible alternative.

Real-World Consequences

From fabricated legal citations to government reports with invented references

The cost of AI errors is escalating — and the regulatory response is building.

In Mata v. Avianca (S.D.N.Y. 2023), attorney Steven Schwartz used ChatGPT to generate legal research containing six entirely fabricated judicial opinions. When he asked ChatGPT to verify the cases, it assured him they were real. A database now tracks 486 cases involving AI hallucinations worldwide, including 324 in U.S. courts — rising from 10 cases in 2023 to 73 in the first five months of 2025 alone.

Deloitte Australia delivered a 237-page government report containing fabricated academic references, non-existent books attributed to real professors, and misquoted court cases — all generated by GPT-4o. The report was produced under an AU$440,000 government contract. Stanford RegLab's peer-reviewed research established the underlying risk: general-purpose LLMs hallucinate between 58% and 88% of the time on legal queries.

Regulatory enforcement is responding. The EU AI Act, effective August 2024, imposes penalties of up to €35 million or 7% of global annual turnover for prohibited AI practices. The Act's AI literacy requirements create direct liability exposure for organisations deploying AI without adequate workforce training. In financial services, Deloitte's Global AI Survey found 47% of enterprise AI users made at least one major business decision based on hallucinated content in 2024.

These errors share a common root: professionals who cannot critically evaluate what AI produces. The fabricated citations, the invented references, the confidently wrong analyses — each reflects a measurable gap in the ability to detect errors, assess reliability, and exercise judgment about AI-generated content. That capability can be measured. And what can be measured can be developed.

APAC Leads — But Faces the Same Gap

The highest AI adoption rates in the world, paired with the same measurement vacuum

60.9%
of Singapore's working-age population uses AI — second-highest globally
Microsoft AI Diffusion Report, Jan 2026
53%
of APAC leaders use AI agents for full process automation — highest worldwide
Microsoft Work Trend Index, 2025
14%
of Singaporean firms have scaled AI across operations — despite leading in adoption
Forrester State of AI Survey, 2025

Singapore stands out as a global AI frontrunner. The National AI Strategy 2.0 commits over S$1 billion over five years, IMDA's TechSkills Accelerator has upskilled over 340,000 individuals, and 73.8% of workers regularly use AI tools. Hong Kong is investing HK$3 billion in AI subsidies and has opened a 1,300-petaFLOPS AI Supercomputing Centre. Over 80% of APAC accounting firms regularly use AI in their professional work.

The firms investing most heavily in AI are also the ones with the largest measurement gap. PwC committed $12 billion to technology and met the target a year early. EY invests $1.4 billion annually in AI. Deloitte has invested $2 billion and equipped 470,000 employees with AI tools. KPMG allocated $2 billion over five years for AI embedding. On the banking side, JPMorgan spends $18 billion annually on technology with 450 AI use cases, and DBS Bank in Singapore is targeting S$1 billion in AI economic value in 2025. The money is flowing. The measurement is not.

The adoption is not in question. Across all of APAC, structured assessment of whether AI use is effective — whether the tools being adopted are producing better outcomes, not just faster outputs — remains virtually nonexistent. The distance between high adoption and measured proficiency is the gap this research makes visible.

The Conclusion

Measurement is the missing layer in enterprise AI adoption

The research compiled here converges on a structural problem. The spending imbalance is clear — tool investment outpaces training investment by multiples. But the deeper issue is that even organisations that invest in training cannot currently verify whether it works. Course completion does not predict proficiency. Self-reported confidence inversely correlates with accuracy for many users. And the jagged frontier means that identical AI workflows produce radically different outcomes depending on whether the task falls inside or outside AI's capability boundary — a distinction that untrained users cannot reliably make.

Three findings make the case for psychometric AI proficiency measurement. Gartner predicts that 75% of hiring processes will require AI proficiency testing by 2027 — signalling imminent demand for validated measurement instruments. Section AI's finding that 97% of workers are novices or experimenters — even after training — demonstrates that current approaches fail to close the gap. And the Aalto University reverse Dunning-Kruger finding — that AI literacy increases overconfidence rather than calibration — directly validates the need for performance-based assessment over self-report instruments.

The cost of inaction is quantifiable. Between workslop rework ($2,232 per employee per year), hallucination verification overhead ($14,200 per employee per year), wasted tool licences (30% idle rate), and the 19-percentage-point accuracy drop documented by the BCG-Harvard study, the aggregate cost of unmeasured AI incompetence likely runs $5,000–$15,000 or more per AI-using employee annually. The enterprises that will extract value from their AI investments are not those that spend the most on tools or on training — but those that can measure, with psychometric validity, whether their people can actually use AI well.

A 2024 Nature systematic review evaluated 16 AI literacy scales across 22 studies and concluded: no psychometrically validated gold standard for measuring AI literacy exists.
Common Questions

Understanding the research

How much does AI incompetence cost enterprises per employee?
Research from BetterUp Labs and Stanford's Social Media Lab, published in Harvard Business Review in September 2025, found that AI-generated “workslop” costs $186 per employee per month — approximately $2,232 per employee per year — in rework time. Additional costs include hallucination verification overhead estimated at $14,200 per employee per year by Forrester.
What percentage of workers are proficient in AI?
Only 3% demonstrate meaningful proficiency. The Section AI Proficiency Report (January 2026) found 97% are at novice or experimenter level, even after completing training. Hands-on testing — not self-reports — was used to establish these figures.
Does AI training improve actual proficiency?
The BCG-Harvard study (published in Organization Science, March 2026) found that AI training alone showed no meaningful performance advantage over simple tool access. The variance was driven by individual proficiency levels, not by training completion. Separately, employees who completed AI training programmes still scored only 40 out of 100 on hands-on proficiency tests (Section AI 2026). Classic estimates suggest only 10–15% of training effectively transfers to workplace application.
What is the jagged technological frontier?
A concept from the BCG-Harvard study describing AI's uneven capability boundary. Tasks that appear similar may fall on opposite sides of what AI can handle. Inside the frontier, AI-proficient workers produced 40% higher-quality work. Outside it, they performed 19 points worse than colleagues using no AI. Workers cannot reliably self-assess which side a task falls on.
Why do self-assessments fail for measuring AI skills?
Aalto University research (February 2026) found a reverse Dunning-Kruger effect: higher AI literacy correlated with greater overconfidence, not better calibration. Separately, 79% of tech workers admit to pretending they know more about AI than they do (Pluralsight 2025). Performance-based psychometric assessment is the evidence-based alternative.
How does proficiency measurement differ from tracking training completion?
Training completion measures activity — who finished a course. Proficiency measurement uses psychometric methods such as Item Response Theory to measure capability — who can apply AI effectively. Section AI's testing found employees who completed AI training still scored only 40 out of 100 on actual skill tests.
What is the AI skills gap in APAC?
APAC leads globally in AI adoption — Singapore has 60.9% working-age adoption — but faces the same proficiency gap. Only 14% of Singaporean firms have scaled AI across operations. Over 80% of APAC accounting firms use AI, but structured measurement remains virtually nonexistent.

References

BCG. (2025). AI at Work: Friend and Foe — The 3rd Annual BCG AI Survey. Boston Consulting Group.

BetterUp Labs & Stanford Social Media Lab. (2025). The hidden toll of AI-generated work. Harvard Business Review, September 2025.

Brynjolfsson, E., Li, D., & Raymond, L.R. (2025). Generative AI at work. Quarterly Journal of Economics, 140(2).

DataCamp & YouGov. (2026). The State of Data and AI Literacy. DataCamp.

Dell'Acqua, F., McFowland, E., Mollick, E.R., et al. (2026). Navigating the jagged technological frontier: Field experimental evidence of the effects of artificial intelligence on knowledge worker productivity and quality. Organization Science. DOI: 10.1287/orsc.2025.21838.

Deloitte. (2026). State of AI in the Enterprise (7th ed.). Deloitte AI Institute.

EY. (2025). 2025 Work Reimagined Survey. Ernst & Young.

Fernandes, D., Welsch, R., et al. (2026). The effects of AI on metacognitive accuracy. Computers in Human Behavior.

Ford, J.K., Baldwin, T.T., & Prasad, J. (2018). Transfer of training: The known and the unknown. Annual Review of Organizational Psychology and Organizational Behavior, 5, 201–225.

Forrester. (2025). State of AI Survey: APAC Enterprise Adoption and Impact. Forrester Research.

Gartner. (2025). Top Strategic Predictions for 2026 and Beyond. Gartner Research.

IDC. (2025). Analyst Brief: AI Skills Shortage and Enterprise Impact.

LinkedIn. (2026). 2026 Workplace Learning Report. LinkedIn Learning.

McKinsey & Company. (2025). The State of AI in 2025: How Organizations Are Rewiring to Capture Value.

McKinsey & Company. (2025). Superagency in the Workplace: Empowering People to Unlock AI's Full Potential.

Microsoft & LinkedIn. (2024). 2024 Work Trend Index: AI at Work Is Here. Now Comes the Hard Part.

Microsoft AI Economy Institute. (2026). AI Diffusion Report: Mapping AI Use Across 147 Countries.

Nature. (2024). Systematic review of AI literacy measurement instruments. npj Science of Learning.

Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity effects of generative artificial intelligence. Science, 381(6654), 187–192.

Parasuraman, R., & Manzey, D.H. (2010). Complacency and bias in human use of automation. Human Factors, 52(3), 381–410.

Pluralsight. (2025). 2025 AI Skills Report: Mind the Confidence Gap.

Salesforce. (2024). Trends in AI for CRM. Salesforce Research.

Section AI. (2026). 2026 AI Proficiency Report. Section.

World Economic Forum. (2025). The Future of Jobs Report 2025. Geneva: WEF.

See how Genplify measures AI proficiency

The research shows the problem is real and quantifiable. If your organisation deploys AI tools, these findings apply to your workforce today. The methodology page explains how IRT-based psychometric assessment addresses it — with confidence intervals on every score.