Measuring the impact of AI on software engineering – with Laura Tacho — The Pragmatic Engineer

Laura Tacho, CTO at DX — a company that measures developer productivity with data — discusses what the actual evidence shows about AI’s impact on software engineering, why most media headlines get it wrong, and what engineering leaders should measure and do to get real value from AI tools.
- The conversation draws on data from hundreds of companies, including detailed case studies from Booking.com, Workhuman, and Indeed, as well as findings from the DORA report on AI in engineering.
- A central theme: AI is real and useful, but the hype cycle creates distorted expectations that hurt both developers and business decision-makers, and the burden falls on engineering leaders to set the record straight with data.

Why media headlines about AI and coding are misleading

Headlines like “AI coding assistants wave goodbye to junior developers” or “OpenAI just released a coding tool to help programmers replace their jobs” are designed for clicks, not accuracy.
- They often originate from vendor PR pitches or oversimplified interpretations of internal metrics.
- Example: The widely cited claim that “Microsoft writes 30% of its code with AI” likely counts accepted autocomplete suggestions — a metric that says nothing about whether that code runs in production or delivers business value.
  - Acceptance rate was already a questionable metric before AI (traditional autocomplete has always been good at predicting code), and conflating “AI-assisted” with “AI-authored” inflates the perceived impact.
This oversimplification creates a two-sided problem:
- Developers see the hype, try AI once, get poor results, and dismiss it as gimmicky — lowering adoption.
- Executives hear the same headlines and pressure teams to show similar results, creating unrealistic expectations that engineering leaders must then manage.

What DX’s AI measurement framework recommends

Abby (DX co-founder) and Laura developed a framework organized around three pillars: utilization, impact, and cost.
- Utilization: Track daily/weekly active users as a percentage of total developers, and check whether licenses are actually available to those who want them. In many organizations, non-adoption is simply a licensing problem, not a skepticism problem.
- Impact: Look at outcomes that matter — developer experience, time savings on real tasks, velocity, cognitive load — not vanity metrics like lines of code or acceptance rate.
- Cost: Especially relevant as consumption-based pricing (tokens) replaces flat licenses, making it harder to allocate budgets across teams and seniority levels.
The framework explicitly excludes acceptance rate and lines of code as primary measures because they don’t correlate with business impact.
- Acceptance rate can signal whether a tool is fit for purpose (low acceptance = poor suggestions), but high acceptance alone tells you nothing about velocity, innovation, or revenue.

What developers are actually saving time on

A DX study of 180+ companies found that the top time-saving use cases for AI are not mid-loop code generation:
1. Stack trace analysis — pasting a 100-line stack trace and asking the AI to diagnose it or suggest fixes, eliminating 30–45 minutes of manual debugging.
2. Refactoring existing code — restructuring or modernizing code with AI assistance.
3. Mid-loop code generation — completing a function from a scaffold or spec.
- Other high-value use cases: code documentation, brainstorming and planning, and unit test generation (anything with well-defined structure).
The key distinction: stack trace analysis and refactoring produce net positive time savings because they eliminate toil entirely, whereas code generation just reallocates time from typing to reviewing and iterating.
- Typing speed was never the bottleneck in software development; reviewing, understanding, and verifying code still takes cognitive time regardless of how fast it was produced.

The paradox: AI can make developers less satisfied

The DORA report on AI in engineering found that many developers feel less satisfied with their jobs after AI adoption.
- AI accelerates the parts developers enjoy (coding, problem-solving, being in flow), leaving a higher proportion of time for the parts they don’t enjoy (meetings, administrative work, toil).
- This is consistent with the reality that developers spend only about 20–25% of their time actually coding (an AWS study found 20% for their engineers), so AI’s time savings apply to a small slice of the work week.
Laura’s hypothesis: AI may help developers manage interruptions and recover from context-switching faster (acting as a “pair programmer” that holds context), but this is difficult to measure with workflow data alone — it requires combining system metrics with self-reported developer experience surveys.

Booking.com case study: Adoption is the key variable

Booking.com made a concerted organizational effort to increase AI tool adoption: office hours, workshops, training, and leadership encouragement.
- They reached 65% of developers using AI tools on a weekly or daily basis, which is above the industry median of 50% and the 75th percentile of 60%.
- The biggest gains came from moving non-users into periodic but consistent usage.
Even with all this effort, 35% of developers still didn’t use the tools weekly. Reasons include:
- Licenses not being available to all who want them (a common pattern across many organizations).
- Certain codebases or tasks (novel, greenfield, highly performance-sensitive, or requiring deep architectural understanding) where AI tools are simply not effective yet.
  - Example: Linux kernel maintainers make tiny, highly deliberate commits where performance and correctness are paramount — AI assistance is limited to brainstorming or tooling, not authoring.

Workhuman case study: Measuring developer experience impact

Workhuman used DX’s developer experience index (a composite of 14 research-backed drivers including incident management, local dev iteration speed, and more) to measure AI’s impact before and after rollout.
- They found an 11% boost in overall developer experience across the organization.
- Daily/weekly AI users had 15% higher velocity than non-users, and this gap has been compounding over time.
The lesson: measuring developer experience broadly — not just coding output — is how you build a credible business case for continued AI investment.
- Start measuring now to establish a baseline; you can pull historical workflow data from GitHub and Jira, but self-reported experience surveys need to begin as early as possible.

What’s working architecturally: Clean interfaces and AI-first documentation

Engineering leaders are recommitting to clean interfaces between services as a direct response to AI adoption.
- Clear, well-defined boundaries (the “everything is an API” model) make it easier for AI agents to navigate and use a codebase — and this also benefits human developers.
Documentation is shifting to serve both humans and AI:
- Human documentation relies on narrative flow and visual dependencies (screenshots, diagrams).
- AI-friendly documentation needs code examples, no visual dependencies, and structured formats that an AI assistant can ingest and act on within the IDE.
- Companies like Vercel and Clerk are leading examples of “AI-first” documentation that creates a flywheel: better documentation → better AI suggestions → more successful implementations → more data → better suggestions.
For internal platform teams, the goal is to deliver documentation at the moment of need (inside the editor) rather than expecting developers to leave their workflow to read a manual.

Indeed’s structured, experimental rollout

Indeed ran a controlled, multi-tool trial — segmenting cohorts, testing different AI tools, and comparing results to determine which tools delivered the most gain for which use cases.
- This approach gave them confidence in spending decisions during a period of tool sprawl and uncertainty.
They tested specific hypotheses, such as: “Can AI code review reduce feedback loop latency for globally distributed teams?”
- A developer in Seattle submits code at end of day; their colleague in Vienna is offline. An AI review can provide preliminary feedback immediately, unblocking the author earlier.
They also identified migrations as a high-value AI use case — undifferentiated heavy lifting that developers hate and that the business doesn’t care about.
- AI can generate the migration PR with tests and an ephemeral environment, so the human reviewer only needs to verify, not research outdated documentation.

A practical tip for AI-assisted migrations

Instead of asking AI to “migrate this file from version X to Y” (which gives mixed results), do one migration by hand, then give both the original and migrated files to the AI and ask it to generate a prompt that reproduces this transformation for subsequent files with similar structure.
- This produces a much more targeted and accurate prompt than a generic migration request.

The risk AI poses to delivery stability

DORA’s data shows that delivery throughput is already slowing slightly as AI adoption increases.
- Hypothesis: AI makes it trivially easy to produce large changes (bigger batch sizes), and bigger batch sizes are inherently riskier.
- DORA predicts that a 25% increase in AI adoption could lead to a 7.2% reduction in delivery stability.
This is why measurement frameworks must include quality, stability, and maintainability — not just speed. Short-term velocity gains at the expense of long-term stability are not a viable trade-off.

Consumption-based pricing and the challenge of allocating AI budgets

The shift from flat licenses to token-based consumption pricing creates a new allocation problem:
- Should senior engineers get larger token budgets (they work on more complex problems) or junior engineers (AI helps them more, so the productivity multiplier is higher)?
- Which use cases deserve more investment? (Stack trace analysis and migrations may offer more ROI than code generation.)
Laura predicts that within 18 months, companies may be spending $1,200–$2,000 per month per developer on AI agents that can complete tasks autonomously (with human verification).
- This is comparable to the $3,000–$8,000 per year that companies spent on Visual Studio licenses and developer tooling suites in the 2000–2010 era — a precedent for high per-developer tooling spend when the productivity case is clear.

Best-case outcomes for end users and companies

For end users of products built with AI-assisted development:
- Faster time to market for validated features, more rapid experimentation, and better products — but not feature bloat or thrashing from building everything that’s now easy.
For companies:
- The winning approach is to treat development as an experiment portfolio rather than a sequential roadmap — using AI to accelerate validated experiments rather than to build out entire backlogs.
- Roadmaps may give way to rapid experimentation cycles where AI reduces the cost of trying new things.

Highly regulated industries are getting the best results

Financial services, insurance, and pharma companies are seeing the strongest outcomes from AI rollouts — not despite regulation, but because of it.
- They are forced to be deliberate and structured: acceptable use policies, data leakage prevention, intentional licensing, and clear budget allocation.
- The lesson: structured rollouts get the best results — “slow is smooth and smooth is fast.”

Advice for engineering leads: Data beats hype

Get a baseline measurement of developer experience and productivity as quickly as possible, then run experiments.
Treat AI as an organizational problem, not just a tooling problem — invest in training, enablement, and change management.
Combine workflow data (from GitHub, Jira, etc.) with self-reported developer experience surveys to capture signals that systems alone can’t observe (flow state, cognitive load, satisfaction).
Use data to tell the story of AI’s impact to executives and to protect the team from inflated expectations driven by media hype.

Rapid fire

Favorite tool: Granola (AI meeting notetaker) — it fills in context from meetings after Laura takes rough notes, dramatically improving her quality of life as a self-described bad notetaker.
Book recommendation 1: Write Useful Books by Rob Fitzpatrick — a short, practical guide to cutting fluff and writing clearly; can be read in an hour.
Book recommendation 2: Unsavory Truth by Marion Nestle — about food marketing, lobbying, and how industries construct narratives; highly transferable to understanding AI hype, media literacy, and how funding shapes public perception.

Summary

Why media headlines about AI and coding are misleading

What DX’s AI measurement framework recommends

What developers are actually saving time on

The paradox: AI can make developers less satisfied

Booking.com case study: Adoption is the key variable

Workhuman case study: Measuring developer experience impact

What’s working architecturally: Clean interfaces and AI-first documentation

Indeed’s structured, experimental rollout

A practical tip for AI-assisted migrations

The risk AI poses to delivery stability

Consumption-based pricing and the challenge of allocating AI budgets

Best-case outcomes for end users and companies

Highly regulated industries are getting the best results

Advice for engineering leads: Data beats hype

Rapid fire