I was staring at two functions — slightly different names, different files, identical purpose. I had not written either of them. I had asked an AI assistant to write one. At some point, across a session boundary I no longer remembered, I had forgotten to mention the first one existed, and the AI had built a second from scratch. This was not a malfunction. The AI had no memory of what we'd built before, I hadn't told it, and it had solved the problem in front of it as competently as it could. The failure was mine. The codebase was simply a receipt.
Over the following months, as I built more substantial projects with AI assistants, I came to a conclusion that reframed everything. Every failure mode I encountered — the duplicate code, the circular bug fixes, the architectural drift — I had seen before. In human engineers. Over two decades of managing development teams across Asia and the Pacific, these were not AI pathologies. They were process pathologies. And understanding that distinction is the key to unlocking what may be the most significant shift in how software gets built since the advent of open source.
The Productivity Paradox
The term "vibe coding" entered the lexicon on February 2, 2025, when Andrej Karpathy — a founding member of OpenAI and former Director of AI at Tesla — described a new way of working: telling AI what you want in plain English and letting it handle the code.1 Collins English Dictionary named it Word of the Year for 2025.2 The enthusiasm was immediate. The backlash was not far behind. Both sides had data.
A December 2025 CodeRabbit study of 470 open-source GitHub pull requests found that AI-assisted submissions averaged 10.83 issues each, compared to 6.45 for human-written code. Security vulnerabilities appeared nearly three times more often; performance inefficiencies were eight times more common.3 A GitClear analysis of 211 million lines of code from 2020 to 2024, spanning repositories owned by Google, Microsoft, and Meta, found duplicate code blocks approximately ten times more prevalent by 2024, while refactoring declined from 25 percent of changed lines to under 10 percent.4 And Google's 2024 DORA Report — the industry's most authoritative benchmark for software delivery — found that increased AI adoption was accompanied by a 7.2 percent decrease in delivery stability, even as individual developers reported feeling more productive.5
Here is the paradox that should concern every technology leader: the tools are making individuals faster while making organizations less stable. If that pattern sounds familiar, it should. It is the signature of a capability that has outpaced the processes around it.
We Have Seen This Before
The instinct is to treat this data as an indictment of the tool. Before defaulting to that conclusion, leaders should ask a more uncomfortable question: How much of this looks like what your teams were already doing — just faster?
Anyone who has managed a development team has encountered the engineer who copy-pastes code rather than abstracts it, because the ticket has a deadline and no one is watching. Has found tests that pass because they test nothing meaningful. Has discovered that the API documentation diverged from the implementation sometime last quarter, and that the person who knew why has moved on.
This is not a critique of individual engineers. It is a description of what happens when capable people operate without adequate systems around them. The insight that produced code reviews, continuous integration, and architectural decision records is not that humans are reliable. It is that humans operating within well-designed systems are reliable enough.
The system is the unit of trust, not the individual.
AI coding assistants do not disturb this principle. They simply introduce a new kind of individual — one whose failure modes differ from a human contributor's, but whose dependence on organizational process is exactly the same.
Understanding the Collaborator
The mental model that made AI-assisted development productive for me came from neuroscience. Anterograde amnesia is the inability to form new long-term memories. A person with the condition may be brilliant but every conversation begins from a clean state.
This describes the AI coding assistant with precision. Within a session, it reasons across files and dependencies with a fluency that would exhaust most engineers. Across sessions, it remembers nothing. The architectural decision you reached last Thursday does not exist for it. The utility function you built to avoid precisely this kind of duplication is invisible unless you surface it explicitly.
The management implication is direct: you would not hand a brilliant contractor with documented memory impairment a complex task and walk away. You would brief them fully each time, maintain rigorous shared documentation, and verify outputs against defined criteria. The instinct to do less than this — because the tool is fast, because the output looks right — is where the duplicate functions and security vulnerabilities come from.
Three Practices That Changed Everything
The practices that resolved most of my early problems were not prompt engineering techniques. They were management disciplines.
Externalize institutional knowledge. Every session with an AI assistant begins with a question the tool cannot answer: What have we already built? I maintain a project context document — a living record of decisions, conventions, patterns to follow, anti-patterns to avoid, and the reasoning behind each. It is provided at the start of every session and updated when decisions change. The unexpected benefit: this documentation, forced by the AI's amnesia, turned out to be documentation I should have been maintaining all along. Future collaborators — human or otherwise — benefit from it. Future me, returning to a project after an absence, benefits from it. The tool's limitation became an organizational asset that outlasted the limitation.
Define acceptance criteria before requesting work. Describing a bug in plain English invites interpretation. Writing a test that defines expected behavior before requesting an implementation gives both parties a concrete target. The circular fix loops I experienced — three rounds of adjustments, each creating new problems — stopped almost entirely under this discipline. This is test-driven development, an established practice. The AI's tendency to approximate rather than satisfy a specification merely makes its value inescapable.
Verify before you trust. A structured review pass after each generation — treating AI output the way you would treat a pull request from a new team member — closes the quality gap. The DORA data showing declining delivery stability alongside rising AI adoption almost certainly reflects, in part, organizations reviewing AI output with less rigor than they would apply to human contributions. The code looks right. It often passes a cursory scan. But looking right and being right are different things, and the distinction is where process earns its keep.
Together, these practices produced a different working relationship with the tool. Not frictionless — nothing in software development is frictionless — but governed. The failure modes became predictable, catchable, and increasingly rare.
The Checklist Is Waiting to Be Written
Leaders looking for a precedent need not search far. The early decades of commercial aviation were marked by accidents attributable not to inadequate aircraft but to inadequate procedures. The aircraft became safe not because the technology improved, though it did, but because the practices around the technology matured. The preflight checklist was a hard-won cultural artifact.6
The developers and engineering leaders building rigorous habits around AI tools today are writing the equivalent of those checklists. They are not compensating for a temporary deficiency in the tools. They are building the norms that the profession will eventually codify, in the same way that code review and continuous integration — once the practices of unusually disciplined teams — became industry standard.
Karpathy himself, a year after coining "vibe coding," proposed the term "agentic engineering" to describe this maturation: the same AI capability, exercised with structured human oversight.7 The code is increasingly written by AI. The architecture, the verification, and the institutional memory remain human responsibilities.
The strategic question is not whether AI coding tools work. They do. The question is whether your organization has the process infrastructure to make them work reliably. The data is unambiguous about what happens without it: more code, more defects, declining stability. But it is equally clear about what disciplined adoption produces: compressed development cycles, systems maintained across languages that would otherwise require specialist hiring, and senior engineers freed to focus on the architectural decisions that actually matter.
The duplicate functions are still accumulating in codebases where the context document doesn't exist, where tests come after the code, and where review is cursory. That is not the AI's failure. It is a management problem with a management solution.
SK102 builds software using AI-assisted development with the process discipline this piece describes — spec-first, test-driven, and locally accountable. See our services.
Footnotes
-
Andrej Karpathy, X (formerly Twitter), February 2, 2025. Karpathy is a founding member of OpenAI and served as Director of AI at Tesla from 2017 to 2022. https://x.com/karpathy/status/1886192184808149383 ↩
-
Collins English Dictionary, Word of the Year 2025. https://www.collinsdictionary.com/woty. Also reported by CNN, November 6, 2025: https://www.cnn.com/2025/11/06/tech/vibe-coding-collins-word-year-scli-intl ↩
-
CodeRabbit, "State of AI vs. Human Code Generation," December 17, 2025. The study analyzed 320 AI-coauthored and 150 human-only open-source GitHub pull requests, normalizing findings per 100 PRs with Poisson rate ratios and 95% confidence intervals. Full report: https://www.coderabbit.ai/whitepapers/state-of-AI-vs-human-code-generation-report ↩
-
GitClear, "AI Copilot Code Quality: 2025 Data Suggests 4x Growth in Code Clones," analyzing 211 million changed lines of code from 2020–2024 across repositories owned by Google, Microsoft, Meta, and enterprise C-Corps. https://www.gitclear.com/ai_assistant_code_quality_2025_research ↩
-
Google Cloud, "Accelerate State of DevOps Report 2024" (DORA). Key findings on AI: a 7.2% estimated decrease in delivery stability and a 1.5% decrease in delivery throughput per 25% increase in AI adoption, alongside gains in documentation quality (7.5%), code quality (3.4%), and code review speed (3.1%). https://dora.dev/research/2024/dora-report/ ↩
-
The aviation checklist is most commonly traced to the aftermath of the 1935 Boeing Model 299 crash at Wright Field. See Atul Gawande, The Checklist Manifesto: How to Get Things Right (New York: Metropolitan Books, 2009). ↩
-
Andrej Karpathy, X (formerly Twitter), February 4, 2026, as reported by The New Stack, February 10, 2026. https://thenewstack.io/vibe-coding-is-passe/ ↩