Cookie Consent

I use cookies to understand how my website is used. This data is collected and processed directly by me, not shared with any third parties, and helps us improve our services. See my privacy and cookie policies for more details.

The Verification Premium: What Classical Training Reveals About AI Coding Costs

New York | Published in AI and Board | 13 minute read |    
My desktop setup: reMarkable Paper Pro for ideas, MacBook Air M2, a headless NVIDIA DGX Spark handling the heavy lifting, and the tools behind the experiment — Amazon Kiro, Claude Code, and a terminal window. Plus the late-night lighting that makes it feel like coding in the 1980s again.

I like to refer to myself as having read the classics — Smalltalk, Modula-2, Haskell, Perl, Turbo Pascal, C++, among others — at a time when the web had just been invented and the first bubbles of the dot com era had started to fizz. Fast forward over 30 years, and it’s been a long time since anybody paid me any money to write code. But over the years, I’ve kept my hand in — reading the RFCs and the research papers, writing the odd ‘helper script’ here and there, spinning up infrastructure in the cloud, building prototypes, and generally making sure that I didn’t become detached from my technical foundations.

When GPTs emerged, they kick-started the AI frenzy we’ve been living through these past three years, and presented me with an opportunity to test a hypothesis: does having decades of software engineering experience still matter when AI is writing the code for you? This wasn’t merely an experiment in whether I could still code — it was a test of AI capability and our responsibilities when using these tools. At Davos last week, Anthropic CEO Dario Amodei predicted that AI models might be doing “most, maybe all” of what software engineers do end-to-end within six to twelve months. “I have engineers within Anthropic who say, ‘I don’t write any code anymore. I just let the model write the code, I edit it,’” he explained. If that prediction holds, the question isn’t whether AI can write code — it’s who has the expertise to know if it’s writing the right code.

So I set up an experiment: build two applications — a markdown editor to help me write content for this website, and a skills tool for comparing organisational capabilities before and after AI adoption. My toolkit was Amazon Kiro, Claude Code, the n8n workflow tool, a terminal, the cloud, and of course me.

The applications themselves were secondary to what I was actually observing: where my experience mattered. The popular claim is that AI democratises software development — that someone without my background could achieve identical results, rendering my decades of experience sunk cost rather than competitive advantage. As I built these tools, I paid close attention to the moments where my training kicked in: the instinct to refactor, the recognition of anti-patterns, the judgement calls about architecture. What I discovered aligns with emerging research — and reveals something Boards need to understand before approving their next AI investment.

What classical training actually provides

Classical software engineering training — the kind that involved understanding data structures, memory management, architectural patterns, hardware and computer design, and why systems fail — creates a form of pattern recognition that AI tools cannot currently replicate: the judgement to know when the technically correct answer is the wrong choice. When Claude Code or Amazon Kiro suggested an approach, decades of experience meant I could immediately assess: Is this architecturally sound? Will it scale? What are the failure modes? Where are the security implications? Does this code look ‘solid’?

This isn’t about writing code faster. It’s about knowing which code to write, which suggestions to reject, and which architectural decisions will create or prevent problems downstream. AI generates options; expertise selects and refines them.

Early in my career, I refactored an application codebase and reduced it by 85% — achieving identical functionality with a fraction of the code and complexity. The original had been written by a junior engineer without formal software engineering training. The patterns I’m seeing in AI-generated code remind me of that original codebase: functional, certainly, but accumulated rather than architected.

Each AI-suggested wrong path drains API tokens, developer hours, and mental bandwidth — costs that experts spot and avoid upfront. Abandoned paths add bloat; architectural misjudgements cost rework. A developer who recognises immediately that the AI has proposed an anti-pattern spends less than someone who implements it, tests it, discovers it fails, and then tries to understand why.

My experience building across multiple tools revealed consistent patterns. The AI accelerated implementation; expertise ensured the right things got implemented. The tools differed in style and capability, but the dynamic remained constant: course correction and verification were the norm.

Research from McKinsey points to the same pattern. McKinsey’s analysis of generative AI in software engineering found developers achieved up to 55% faster task completion in greenfield projects — but only 10–20% gains in mature codebases due to verification overhead. For novices, the net gain dropped to near zero after debugging. Senior developers saved twice as much time as juniors.

A randomised controlled trial by METR provides perhaps the most striking counterpoint to productivity claims. Studying 16 experienced open-source developers completing 246 tasks on repositories they’d contributed to for years, researchers found a 19% net slowdown when developers used AI tools. Before starting, developers predicted AI would reduce completion time by 24%; after finishing, they believed it had saved them 20%. Objective measurement showed the opposite: a 19% slowdown. The “hidden taxes” of verification, context-switching, and subtle defect correction offset initial speed gains — and these were experienced developers working on codebases they knew intimately.

The counterintuitive insight: experienced developers produce cheaper AI-assisted code, not despite their higher salaries but because of what their expertise prevents — fewer wrong paths, shorter debugging cycles, sounder architecture, and lower technical debt accumulation. Boards evaluating AI coding investments often frame them as ways to reduce dependence on expensive senior talent. The evidence suggests this framing inverts the actual economics.

The technical debt time bomb

GitClear’s analysis of 211 million changed lines of code shows an eightfold increase in duplicated code blocks, with 70% of this debt contributed by inexperienced users. In contrast, classically trained developers leverage AI primarily for ideation, reserving direct code insertion for verified scenarios — reducing debt at the source.

This puts me in mind of the old IBM pay structure — paid per K-LOC, per thousand lines of code written. We learned decades ago that measuring productivity by volume incentivises exactly the wrong behaviours. AI coding tools are recreating this problem at scale, generating volume that looks like productivity whilst accumulating debt that someone else will eventually have to service.

Simultaneously, “moved” code — indicating healthy refactoring — has declined whilst copy-pasted code has surged. This aligns with OX Security’s “Army of Juniors” research, which found inexperienced users introducing “insecure by dumbness” patterns at unprecedented velocity. The study identified 10 critical anti-patterns appearing in the vast majority of AI-generated code — systematic behaviours that contradict decades of software engineering best practices. “Comments everywhere” was found in 90–100% of AI-generated code — comments that serve the AI’s context management rather than human comprehension, cluttering repositories and making code review more difficult.

Google’s 2024 DORA report found that a 25% increase in AI adoption correlates with a 7.2% decrease in delivery stability, despite faster code reviews. Their 2025 follow-up confirmed AI adoption remains near-universal — 90% of developers now use it — but the stability concerns persist. MIT professor Armando Solar-Lezama captured this dynamic precisely: AI is “a brand new credit card that is going to allow us to accumulate technical debt in ways we were never able to do before.”

HFS Research estimates that the Global 2000 are carrying $1.5–2 trillion in accumulated technical debt — and a $1.5 trillion IT services industry has grown up around servicing rather than eliminating it. AI-generated code, without adequate verification, compounds this burden rather than resolving it.

The hidden carbon cost of vibe coding

The economics extend beyond rework costs. Research published in Nature found that AI models emit up to 19 times more CO₂ equivalent than human programmers when generating functionally equivalent code — driven primarily by the iterative corrections required when AI produces incorrect outputs. For inexperienced developers who require more prompts, more corrections, and more regenerations to reach working code, the inference emissions multiply correspondingly.

This creates a double environmental burden: higher Scope 3 emissions during development from excessive inference, and higher ongoing operational emissions from running inefficient code at scale. The GitClear findings on code bloat translate directly into compute costs — and those compute costs translate into carbon.

Organisations tracking Scope 2 and Scope 3 emissions may discover their AI coding initiatives are creating sustainability liabilities alongside technical debt. The ‘democratisation’ of coding through AI tools has an environmental footprint that grows in inverse proportion to developer expertise.

Vibe coding and the first-mover disadvantage

The technical debt time bomb has already detonated for thousands of startups. Analysis suggests roughly 10,000 startups attempted to build production applications with AI coding assistants. More than 8,000 now face rebuild or rescue engineering costs ranging from $50,000 to $500,000 each. The total cost of what’s being called “the vibe coding cleanup” ranges from $400 million to $4 billion — a substantial cost for initiatives meant to democratise software development.

Consider the pattern: a non-technical founder builds something that works, users validate the concept, and investors show interest. But ‘works’ and ‘scales’ are different engineering problems. The codebase isn’t architected — it’s accumulated, with every prompt that said ‘make it work’ creating debt that compounds silently.

Success becomes the trigger for crisis. When traffic spikes, the database queries that functioned for a hundred users collapse at a thousand. While the founder refactors, they’ve already demonstrated what users want — and a well-capitalised competitor with experienced engineers can now build the correct version from first principles, moving faster precisely because they’re building correctly rather than fixing what was bodged together. The founder did the market research, validated demand, and educated customers. Someone else captures the value.

JP Morgan’s guidance for investors now includes explicit questions about “vibe coding” exposure: What percentage of code is AI-generated? Can the team explain architectural decisions? What’s the technical debt profile? What’s your verification process? For Boards conducting due diligence on AI-built products, these questions distinguish prototypes masquerading as products from genuinely scalable investments.

The “insecure by dumbness” problem

The OX Security research introduced a phrase that captures a distinct failure mode: “insecure by dumbness.” This describes non-technical users deploying applications built with AI tools at unprecedented velocity, without corresponding security expertise.

The research analysed over 300 repositories and found that AI-generated code doesn’t contain more vulnerabilities per line than human-written code. The crisis stems not from code quality but from velocity. The bottlenecks that previously controlled what reached production — code review, debugging, team-based oversight — have been removed. Functional applications can now be built faster than humans can properly evaluate them. Vulnerable systems reach production at unprecedented speed.

Gartner’s predictions quantify where this leads: by 2028, “prompt-to-app” development by citizen developers will increase application defects by 2500%.

These aren’t bugs. They’re architectural decisions made by systems without architectural judgement or contextual awareness, accepted by users without the expertise to question them. For organisations deploying AI coding tools, the security question extends beyond traditional application security. Who is generating code? What do they understand about what they’re generating? What verification processes exist between generation and production?

The expertise pipeline problem

This connects to the expertise pipeline destruction I’ve explored in previous work. If organisations use AI coding tools to bypass the need for experienced developers, they create immediate quality risks whilst also eroding their ability to even assess what they’ve built.

The traditional model had junior developers writing code whilst senior developers reviewed and corrected it. Through repetition, juniors developed the pattern recognition that eventually made them seniors. This wasn’t inefficiency — it was capability development. The junior-to-senior pipeline created the verification expertise that organisations now need more than ever.

If AI replaces that junior work whilst organisations simultaneously reduce senior headcount, who develops the expertise to verify AI outputs? Who refactors the AI-generated codebase when no one in the organisation understands it? Who trains the next generation of architects? The AI optimist’s answer is that better models will eventually verify themselves. Perhaps — but organisations are depleting their verification capacity today, betting on capability that doesn’t yet exist. And even self-verifying AI would still need someone to define what ‘correct’ means for this business, these users, this context. That judgement is precisely what senior expertise provides.

Stack Overflow’s 2025 developer survey revealed that 45% of respondents found debugging AI-generated code more time-consuming, with 66% frustrated by solutions that are ‘almost right, but not quite.’ Trust in AI accuracy has collapsed — only 33% trust it, while 46% actively distrust it. Notably, early-career developers show the highest AI adoption rates, while developers with 10+ years’ experience show the lowest daily usage and highest resistance. Classical training isn’t nostalgia — experienced developers recognise limitations that juniors haven’t yet learned to see.

The long-term calculations expose compounding risks over illusory gains. Increasing AI-generated code volume whilst decreasing verification capacity creates compound risk that accelerates rather than diminishes over time. Organisations are simultaneously generating more code that requires expert review whilst reducing the supply of experts capable of providing it.

Board governance implications

When Boards approve AI coding tool investments, they’re making implicit assumptions about where value will be created. The common framing — using AI to reduce dependence on expensive developers — inverts the actual economics the evidence reveals.

McKinsey research indicates that heavy technical debt already consumes 20–40% of IT budgets. MIT Sloan research found that while AI delivers short-term productivity gains, technical debt quickly eclipses the improvement — particularly when inexperienced developers deploy AI-generated code in legacy environments. Their economic modelling suggests the pattern reverses with experienced oversight and deliberate debt management.

The strategic question isn’t “can we use AI to write code cheaper?” It’s “do we have the verification capability to ensure AI-generated code creates value rather than debt?”

For Boards, this suggests AI coding investments should pair tool deployment with expertise investment. The productivity gains are real, but they require human capability to capture.

The verification premium

The evidence from my experimentation across AI coding assistants aligns with what research now demonstrates at scale: expertise doesn’t become less relevant when AI assists — it becomes the determining factor in outcomes.

Classical training in software engineering — understanding why systems work, not just how to make them appear to work — creates verification capability that AI coding tools require but cannot provide. That capability determines whether AI assistance compounds productivity or compounds debt.

Boards face a defining choice in how they frame AI coding investments. The “reduce dependence on expensive expertise” framing leads to the patterns now visible in startup failures and technical debt accumulation. The “augment experienced developers” framing captures productivity gains whilst maintaining the verification capability that ensures quality.

The verification premium is real. As my own experience confirmed, it pays dividends when invested upfront. The question is whether Boards will invest in it before or after discovering its absence.

Let's Continue the Conversation

Thank you for reading about the verification premium and what classical training reveals about AI coding costs. I'd welcome hearing about your Board's experience with AI coding investments - whether you're discovering that expertise determines outcomes more than tool selection, wrestling with how to measure technical debt accumulation from AI-generated code, or finding ways to pair AI deployment with the verification capability needed to capture real productivity gains.