
I’ve spent the last year watching smart engineering teams make the same mistake. They adopt AI to speed up coding without changing the core of how they build software.
With Claude, Copilot or Cursor, they see quick improvements in delivery speed and test coverage. For leadership, those early gains seem to justify investments. But six months in, when we ask the VP of Engineering what has changed about how they build software, we usually get a version of the same answer, “Not much, honestly.”
That is the ceiling often overlooked in software delivery, and it’s becoming the most important problem right now.
Most companies assume the constraint lies in the limits of AI technology. But according to Forrester’s State of AI Survey, only 35% of AI decision-makers trains staff to make decisions with AI models within their companies, and 23% offer prompt-engineering training. The real constraint is the workflows around AI, because the models can do more than most engineering processes allow, and processes are getting in the way of progress.
McKinsey surveyed nearly 300 firms and found a 15% performance gap between organizations that rebuild their operating model for AI versus those that just deployed tools. Top performers were using AI inside modified ways of working. Nearly two-thirds of these companies had restructured teams and processes across at least three key operating model dimensions, while only 10% of the bottom performers had done the same.
We’ve watched this play out enough times to say that the bottleneck is the operating system around coding. Dropping AI tools into an existing operating model built around legacy processes briefly compresses it, and then engineering teams hit the ceiling.
While AI-assisted development is a powerful accelerator, engineers still work through the same processes and with the same team structure. Requirements flow through the same people, and validation comes late. Workflow coordination overhead persists in the form of endless clarifying, constant chasing and rework caused by undocumented decisions.
Most organizations don’t realize they’re hitting this ceiling until the initial AI excitement fades, and productivity gains plateau. They conclude that AI is overhyped.
What they need is a no-hype AI rebuild to use it well.
“AI-native” is easy to say and harder to explain day to day. Here’s my version of it. Being AI-native means shifting humans from the role of the primary producers of software artifacts to the supervisors of systems that produce them. Engineers are no longer the only ones writing code, tests and documentation. Their more important role is to define the context in which AI systems operate, set guardrails, and decide when the machine has earned a broader scope.
I saw this on a recent greenfield project. My team of four was working on a lost-item tracking system for public transport from scratch. With the same scope, delivery moved 40% to 60% faster. Features that would normally take one to two weeks were landing in two to three days. In a traditional setup, we would likely have needed roughly twice as many people to deliver the same work. The delivery model mattered as much as the tech, especially given the small team. We worked alongside AI agents handling backend, frontend, database, and testing tasks.
Our work didn’t follow long release cycles. It moved in a tight continuous delivery loop where feedback shaped the next step: define the task, generate outputs, test immediately, validate with QA agents and refine in the next pass.
The team used test-driven development with separate QA, Developer, and Reviewer agents. Independent API and Playwright UI test suites checked the work in a continuous feedback loop. By handoff, all acceptance criteria and quality gates have been met. The only remaining work was minor front-end UI refinement.
The numbers catch attention, but I wouldn’t anchor them. What matters is the process designed as AI-native from the start, not patched onto an existing workflow.
Brownfield, where a legacy product or platform must be modernized, is harder. We’ve learned first-hand that retrofitting AI into an existing enterprise system without breaking it is more complicated because of legacy code, production risk, and existing team dynamics.
We are modernizing a large platform with multiple backend microservices built in .NET across different repositories, a complex Angular frontend and many third-party integrations. What we would never do there is to drop a multi-agent system into a live codebase, expecting it to behave.
AI was introduced carefully into a live system. We started with a feature that had real complexity, measured the results closely and expanded only after the process had structure around it. When we were past the structured phase, feature delivery improved by 25% to 40%. As the system matured, teams accepted 70% to 85% of agent output. The lesson I want to share is that AI effectiveness depends heavily on the quality of context and the discipline of the workflow. Early on, AI-generated outputs required careful review due to gaps and inconsistencies, but as we improved the context, introduced guardrails and refined the processes and workflows, the results became much more stable.
I call it progressive trust, but most AI adoption programs mistakenly skip this piece, thinking they are not ready for more AI autonomy than they already permit.
AI autonomy does not begin at full scope; autonomy is earned over time. Early on, agents handle narrow tasks, such as drafting a specification, generating unit tests, or proposing a data model, while human review remains constant, and corrections feed back into the system. Acceptance rates often start around 35-40%, with scope expanding only when AI accuracy justifies it. By the midpoint of the most mature engagements, acceptance rates exceed 60%, and most outputs need only refinement rather than rework.
Engineering leaders like raising objections, “I’m not ready to hand this over to AI.” Progressive trust doesn’t assume the readiness is already there but builds it over time.
The first thing I tell leaders who are serious about a no-hype AI rebuild is, ”Don’t start with a new tool selection; better start with a map.”
The idea of “starting with a map” comes from real cases: it helps uncover where effort is lost before AI is introduced and where new bottlenecks will appear after. Based on this, the next step I would make is to build a clear transformation roadmap that shows how processes evolve, how adoption happens safely, and how the project transitions step by step to an AI-enhanced way of working.
I make the process easier by asking the right questions. Where do handoffs slow down? Where does context get lost between roles? Where are people spending time coordinating work instead of creating it?
The map usually reveals two or three bottlenecks where teams are already losing speed because of how their work is organized, even before AI enters the picture.
The next step is to structure inputs. If teams skip this step, this becomes one of the primary reasons AI pilots fail.
AI-native delivery runs on machine-readable context. That includes requirements, design decisions and technical constraints, captured in structured form before any AI agent touches them. When AI agents have authoritative inputs, they don’t waste cycles clarifying. They execute. Getting that layer right before adding AI agent execution is the difference between a system that improves time and one that produces inconsistent outputs and gets quietly abandoned.
I often say, “Expect no moonshot.” Build a real feature with real stakes, scoped to run without production risk. Measure acceptance rates, intervention frequency, and delivery time against your baseline. Let the data drive the next expansion of scope.
The rebuild takes longer than most organizations want and usually requires multiple engagements before teams arrive at a reliably repeatable model. Parts of the workflow break down in unexpected ways, and role and behavior changes are real. Developers move into AI engineering, while QA shifts toward validation strategy, and both require real reskilling and a willingness to adapt to new ways of working. Team dynamics also shift, and those changes need active management.
But the compounding effect is real. Every iteration improves the system’s context, narrows the gap between AI output and human-ready quality, and expands what the team trusts the machine with.
Once AI-native delivery is seen at full stride, AI-assisted delivery, enabling faster execution on the same old foundation, starts to feel like a misspent investment.
While the AI-native rebuild isn’t easy, AI-assisted patching has a natural built-in ceiling, and most teams are already hitting it. The question is what they decide to do next.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
Vittesh Sahni is the senior director of AI engineering at Coherent Solutions, where he oversees how AI capability translates into measurable digital value for enterprise firms. He leads two closely connected practice areas focused on building AI capabilities for enterprises and bringing AI into software delivery through AI tools and AI-native workflows across the SDLC. His work focuses on responsible AI adoption and practical, measurable outcomes.
Sponsored Links
