Skip to main content
Test Double company logo
Services
Services Overview
Holistic software investment consulting
Software Delivery
Accelerate quality software development
Product Impact
Drive results that matter
Legacy Modernization
Renovate legacy software systems
Pragmatic AI
Solve business problems without hype
Upgrade Rails
Update Rails versions seamlessly
DevOps
Scale infrastructure smoothly
Technical Recruitment
Build tech & product teams
Technical & Product Assessments
Uncover root causes & improvements
Case Studies
Solutions
Accelerate Quality Software
Software Delivery, DevOps, & Product Delivery
Maximize Software Investments
Product Performance, Product Scaling, & Technical Assessments
Future-Proof Innovative Software
Legacy Modernization, Product Transformation, Upgrade Rails, Technical Recruitment
About
About
What's a test double?
Approach
Meeting you where you are
Founder's Story
The origin of our mission
Culture
Culture & Careers
Double Agents decoded
Great Causes
Great code for great causes
EDI
Equity, diversity & inclusion
Insights
All Insights
Hot takes and tips for all things software
Leadership
Bold opinions and insights for tech leaders
Developer
Essential coding tutorials and tools
Product Manager
Practical advice for real-world challenges
Say Hello
Test Double logo
Menu
Services
BackGrid of dots icon
Services Overview
Holistic software investment consulting
Software Delivery
Accelerate quality software development
Product Impact
Drive results that matter
Legacy Modernization
Renovate legacy software systems
Pragmatic AI
Solve business problems without hype
Cycle icon
DevOps
Scale infrastructure smoothly
Upgrade Rails
Update Rails versions seamlessly
Technical Recruitment
Build tech & product teams
Technical & Product Assessments
Uncover root causes & improvements
Case Studies
Solutions
Solutions
Accelerate Quality Software
Software Delivery, DevOps, & Product Delivery
Maximize Software Investments
Product Performance, Product Scaling, & Technical Assessments
Future-Proof Innovative Software
Legacy Modernization, Product Transformation, Upgrade Rails, Technical Recruitment
About
About
About
What's a test double?
Approach
Meeting you where you are
Founder's Story
The origin of our mission
Culture
Culture
Culture & Careers
Double Agents decoded
Great Causes
Great code for great causes
EDI
Equity, diversity & inclusion
Insights
Insights
All Insights
Hot takes and tips for all things software
Leadership
Bold opinions and insights for tech leaders
Developer
Essential coding tutorials and tools
Product Manager
Practical advice for real-world challenges
Say hello
Developers
Developers
Developers
Pragmatic AI

Quality you can’t generate: AI is only as good as your constraints

AI changed the cost structure of software. It didn't change the value structure. The value is no longer the code you write. It's the taste, judgment, and constraints you encode into the system that shapes what AI produces.
Dave Mosher
|
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

The software industry has always treated quality and speed as a trade-off.

If you believed the prevailing wisdom, you had two options:

  1. Move slowly to ensure excellence.
  2. Move fast and accept the mess.

AI has sharpened this tension, mostly because its most obvious value proposition is raw speed. You can generate more code faster than ever before. You can produce UI, docs, tests, and scaffolding at a pace that would have sounded absurd a few years ago, and has increased dramatically in just the past few months.

The software industry is still missing the point.

Speed was never the goal.

​Speed is a byproduct of how effectively we achieve an outcome under constraints. It is a signal, not the objective.

But, quality is also not the goal.

Quality is the foundation that makes speed possible. It determines how long you can move quickly before the system pushes back. AI amplifies this in both directions—compressing feedback loops when used with rigor, and launching you on a rocketship to entropy when used without it.

Think about what a quality focus actually gives you: fewer unintended side effects and the ability to change the system without fear. AI, used with rigor, extends that capacity to a much broader set of work.

This is the heart of why AI should matter to you.

Three days to make the other mistake

Early in my career, I was a speed addict. I shipped code and built prototypes in the browser instead of thinking critically or writing tests. I was vibe-coding before it was even a thing! I was pretty opposed to anything that could be interpreted as rote or process-heavy, and for good reason. A lot of my experiences at the time were a reaction to dealing with the downstream effects of being handed an artifact somebody else had produced and told to implement it.

Working with Justin Searls, who had the same motivation but a different approach, was illuminating for me. He gained his speed through a quality focus—TDD, discipline, and constraints I found deeply uncomfortable. I chafed at it. Often.

​His response, almost every time, was the same: “Why don’t we just try it for three days?”

I don’t know how he landed on three days, but in retrospect, it was the right amount of time. Long enough that I couldn’t bail at the first sign of frustration. Short enough that it didn’t feel like a permanent commitment. It was an invitation to make the other mistake—to actually inhabit the opposite extreme, the space that I didn’t want to be in, instead of just nudging toward it.

I vividly recall the moment things clicked. We were working on a drag-and-drop table component for a rich-client app. It needed to work in IE6. I was intimately acquainted with the quirks of that browser. I knew exactly where the code needed to go. I wanted to just start coding.

​Yet Justin firmly and gently pushed me to lean into TDD, and helped me stick to it. He was confident we could express the logic necessary to get to a working solution without looking at the browser. I was dubious, but I begrudgingly agreed. We built that feature without opening the browser. Tests expressed the contract, discovered the right API, simulated drag-and-drop, and worked through how the feature should behave. And the first time we opened the browser, it worked perfectly. I was hooked.

I didn't abandon my old instincts—my bias for speed, my intuition about where code wants to go. But I learned to channel them into a completely different frame. More importantly, I learned something about what happens when you force yourself into a space you're uncomfortable with.

​Mark Rabkin calls this “making the other mistake.” The calibration needed to arrive at some ideal state looks a lot more like ping-ponging between two unideal extremes than most of us are comfortable with, but it is far more effective than trying to course correct with minor increments.

Line graph illustration showing speed effort on the left and quality on the right with arrows depicting over indexing then settling in the middle

​Engineers know this instinctively. When debugging, nobody steps through commits one at a time—at least not once you know about git bisect. We cut the problem space in half with each iteration. Small nudges from your comfort zone are like using a linear search algorithm to find answers in a massive corpus of data. (I’ll spare you the big O notation, but trust me—it’s not the algorithm you want.)

Swinging between extremes is basically a binary search, and my experience tells me that I find my ideal zone by being willing to live with the discomfort of overshooting, for about three days to get over the hump and give my brain a chance to adapt.

I've since watched this same dynamic play out at the team level, over and over. Most teams are stuck on one side of the quality-speed spectrum. They either ship fast and break things, or end up in analysis paralysis trying to craft the perfect plan and ship never.

When things go wrong, the instinct is to make small adjustments. But small adjustments don't threaten the team's identity, and that identity is what keeps them stuck. Speed-addicts see themselves as scrappy and pragmatic. Quality purists see themselves as rigorous and responsible. Each identity has a gravitational pull, and any deviation too far from the magnetic core gets viewed with skepticism. A small correction doesn't challenge that identity—it manifests as a patronizing gesture that acknowledges change so everyone can feel good about it before promptly reverting back to normal.

AI makes this dynamic more consequential. Imagine a speed-biased team generating code it doesn't understand, shipping features that don't solve the actual problem, and accumulating tech debt at machine speed. The cost of staying stuck on one side of the spectrum is now much higher than it used to be.

Not speed or quality. Not speed then quality. Speed and quality.

Hammers, screwdrivers, and meta-skills for AI development

My ‘three-days’ experience taught me that the goal of software engineering isn't about the one golden approach. That may work for a season, but seasons change, along with tools and techniques. We need to be in the habit of refining our judgment and taste to know which approach a given moment calls for.

​Justin often described his tools and techniques as either a “blunt instrument” or a “precision instrument”—and the same tool could be either one depending on how you wielded it.

​TDD is a perfect example. Used as a blunt instrument, TDD incentivizes the wrong behavior. Teams chase coverage metrics, causing developers to write tests to satisfy the metric, and focus drifts away from what actually matters: does our testing strategy give us confidence to change the system?

But TDD wielded as a precision instrument looks different. Justin shared his approach—discovery testing—which is methodical and deliberate:

  1. Start at the top of a tree that represents the thing you're building.
  2. Work your way down one level at a time.
  3. Use isolated unit tests with test doubles to discover what collaborators need to exist, categorizing each as needing further decomposition, a pure-function leaf node, or a third-party wrapper.
  4. Recurse until each branch terminates—you're done when you've reached leaf nodes that need no further collaborators.

You end up with tests that express a contract, code that fulfills it, and a safety net that gives you the confidence to refactor aggressively.

I love how Annie Duke captures the meta-skill at the heart of this in How to Decide:

“If you're putting together a dresser that comes with a set of screws, you could be tempted to use a hammer to save time if you don't have a screwdriver handy. Sometimes, a hammer will do an okay job, and it will be worth the time you save. But other times, you could break the dresser or build a shoddy health hazard. The problem is that we're just not good at recognizing when sacrificing that quality isn't that big a deal. Knowing when the hammer is good enough is a metaskill worth developing.”

The obvious move is to split the work: AI for the blunt tasks, humans for the precision tasks. But I think that framing is incomplete.

AI isn't just the hammer. It can be the screwdriver too—and increasingly, as models become more capable, the most sophisticated multi-tool we can conceive of. The thing that determines which mode it operates in isn't the task. It's the quality of judgment you bring to bear. Wield AI with vague intent, and you get blunt results regardless of the task. Wield it with precise inputs, clear constraints, and exacting taste, and it can operate as a precision instrument even on complex work.

Line illustration of a multitool

So I'm less worried about mismatching which tool I use for which task, and much more focused on dialing in the right amount of taste and judgment for every task I begin with AI. If AI can increasingly do all things with a sufficiently skilled operator at the helm, then it behooves us to focus less on which tool the moment calls for and more on the quality of the inputs we're feeding into that tool.

Which raises an obvious question: if taste and judgment are what determine whether AI produces something useful or something dangerous, then what exactly goes into that equation?

Inputs, outputs, and outcomes: Why constraints matter

I’m a software engineer, so I tend to think of things in terms of inputs, outputs, and functions.

Pure functions are simple and easy to understand, but incredibly powerful when composed together into something greater than the sum of their parts. The central lesson of React is a good example. A function that expresses the insight of React is:

​fn(state) -> ui

​Put simply, UI is a function of state.

​A function that expresses the wisdom of the software craftsmanship movement might be something like this:

fn(tdd,xp,pair-programming,ci/cd) -> software

Inputs and outputs. This is historically what software engineers have focused on, and for a long time, it was easy to make the case that the focus on these specific inputs led to higher-quality software.

Line illustration of a building component, a building, and a city block of buildings

​While it may be true that quality inputs lead to quality software, I think it misses a fundamental truth: a software solution can satisfy every software purist's definition of quality and still fail because it doesn't lead to the outcome of finding product-market fit. Conversely, we've all seen software products that are brittle, awkward, hard to change, and yet wildly successful because they solved the right problem at the right time for their users.

Outputs are cheap. Outcomes never were.

Input or output quality does not determine outcome quality. I don’t even think there’s a case to be made that they have a strong correlation. If the rise of AI is doing anything, it is shining a big spotlight on this distinction by dramatically lowering the cost of generating outputs.

Line illustration of a computer chip, a motherboard, and an end product

The unit economics of code and software are changing. What hasn't changed is the importance of our existing inputs—not as guarantees of outcomes, but as constraints we encode into the system so that AI can produce quality output. High-value outcomes still don't happen automatically, no matter how fast you can generate code.

What’s changed (and what hasn’t)

Test Double has a long history of bringing a quality focus to a world wrapped up in outputs and speed—from testing methods and tooling in the JavaScript ecosystem, through rich-client application development during the rise of SPAs, to developer tools that helped the Ruby community write, analyze, test, and ship with confidence.

​Practices like TDD, pair programming, and continuous delivery still matter. When we talk about quality, we still care about all of these things because they are even more important in a world where AI can amplify both good and bad ideas.

What has changed is where the bottleneck is.​

While the cost of generating outputs has decreased, the cost of a bad idea has increased astronomically.

A vague requirement used to produce a bad implementation in one corner of the system. Now, a vague idea can be propagated across architecture, interfaces, tests, docs, and integration code before anyone has even thought about whether the framing was correct in the first place.

A badly phrased idea, given to an LLM, now has the potential to do the kind of damage we used to associate with a bad line of code that took down production. We just don’t have static analysis for ideas in the same way we (hopefully) do for bad lines of code.

The apex of quality: outcomes

When we used to say “quality,” we often meant code quality. That’s not going to cut it anymore when we aren’t writing all the code ourselves, and increasingly aren’t reading all of it either.

Our definition of quality needs to expand. We need at least three operational lenses for evaluating quality during software delivery:

  1. Code quality: Is code maintainable, testable, operable, and shippable.
  2. Decision quality: Are we working from a well-framed problem, with clear line of sight to trade-offs and assumptions?
  3. Communication quality: Do stakeholders understand and trust what we are doing and why?

Any software project could score well on all three of these dimensions and still fail. There is a fourth dimension that sits above the others, the apex dimension: outcome quality.

Did we actually move the needle on what mattered?

  • Did we increase revenue?
  • Reduce operational overhead?
  • Improve conversion?
  • Unblock a launch?
  • Create a safer or more compliant system?
  • Help a nonprofit serve more people?
Line drawing of a rocky mountain peak with a base camp and people building structures

​We must reorient ourselves to filter all of our previous quality emphases through this lens of outcomes. We might hold to previous patterns—combinations of inputs that produced a certain type of output we thought was high quality. We might call those useful patterns. We talk a lot about anti-patterns as well, often viewed through the lens of the outputs produced, not outcomes.

There are useful patterns and useful defaults—but there is no universal checklist you can follow that guarantees outcome quality across every software project.

​So, no, AI doesn’t make our quality practices obsolete.

​It forces us to apply the same discipline higher up the stack.

​The harness: Encoding taste and judgement into AI workflows

Cost of outputs is plummeting, and quality of outcomes is the apex consideration for software engineering. Our work has to change.

​We are no longer on the hook only for producing code that fits our preferred architecture. We are increasingly on the hook for something more abstract and much more valuable: designing the harness that transforms cheap outputs into durable outcomes.

That harness wraps around the entire loop. Inputs, outputs, outcomes—it governs the cycle and prevents drift.

harness(fn(taste, judgment) -> outputs -> outcomes)

The harness is where your taste, judgment, and hard-earned experience get encoded into something durable—something that doesn’t evaporate between sessions or get lost when the context window rolls over.

Where to focus encoding taste and judgement in agentic coding:

  • Architecture boundaries and naming conventions
  • Lint rules, types, and interfaces
  • Tests and testing strategies
  • Feedback loops and review workflows
  • Decision logs. Observability (in the traditional infrastructure sense, and how visible your team’s decision-making process is to people who depend on it)
Line illustration of a sphere made up of multiple, intesecting rings

​

“Harness” sits amidst a pile of terms emerging since 2022 describing the same shift: prompt engineering, context engineering, harness engineering, intent engineering. Call it what you want, but they are all attempts to name the same problem we now face as software engineers.

​Specific tools don't matter. The model you use to generate code, the IDE, the framework—the point of leverage has shifted away from these choices. Not because they’re irrelevant, but because they are no longer the primary differentiator. The harness, and your ability to create it, is.

Our task is to distill the inputs of our function into verifiable constraints.

​This is the bridge between the old world that many of us are grieving the loss of—writing the code by hand—and this new world that increasingly requires us to encode our taste and judgment not in code, but in constraints machines can follow.

​A pragmatic AI practice does not reject speed or rigor—it rejects defaults. The whole point of three days, swinging between extremes, and learning to wield the same tool as both hammer and screwdriver, is developing the ability to adapt your quality focus to the situation in front of you instead of reaching for the same playbook every time.

If you’re still picturing AI-assisted development as “the model writes code and a human reviews it line by line,” that mental model is already aging out. Charity Majors has been making this point for a long time:

“You have NEVER been able to know what the code will do by reading the code. You have ALWAYS needed to ask your instrumentation to understand your code in production.”

​AI did not create this problem. It made it impossible to ignore, because the volume of code now exceeds any team's ability to read their way to confidence. The answer was always observability, feedback loops, and verification mechanisms that don't depend on a human reading every line. The harness, in other words.

​At Test Double, we’ve been in the business of optimizing for feedback loops, verification, and confidence for years. The problems and the optimizations are the same. The only thing that's changed is the medium.

Leading and lagging indicators of outcome quality

As I leaned more into product engineering, I was forced to get clearer about my own thinking on outcomes. This shift accelerated for us as a company when we acquired Pathfinder Product. This brought effective product thinking into our practice—people whose instinct was to ask “what outcome are we actually driving toward?” before ever writing a line of code.

​In my journey to embrace product thinking, the lesson that stuck with me is that outcome quality is not something we should defer to evaluating post-launch.

Line illustration of a person planting seeds and the vegetable plants that grow from those seeds

Yes, the highest-confidence outcomes are lagging indicators: adoption, revenue, demand, reliability, customer satisfaction, operational cost, and compliance.

But on an in-flight software project, especially a complex one, there are real and meaningful leading indicators of outcome quality:

  • Trust in the team
  • Confidence in and visibility of the architecture’s boundaries
  • Ability to pivot when the strategy changes
  • Clarity about what work is protected versus at risk
  • Reduced decision churn
  • Faster alignment in stakeholder conversations
  • Less thrash when constraints change
  • More honest planning because assumptions are big and visible
Line illustration of a building under construction with a dotted line to an imagined city

These are not “soft” outcomes. They are the conditions that make high-quality product outcomes more likely. And they are outcomes a team should be able to identify long before a launch date.

​What’s next for us?

Despite being named Test Double, our heritage is not “we like tests.” We care about feedback loops, clarity, ease of change, and minimizing unintended consequences. We care about working in a way that makes change safer and systems more understandable.

​In the pre-AI era, that often showed up as strong opinions and tooling to support testing, pair-programming, refactoring, and code-centric rigor because the production of code dominated the cost structure of consulting.

In the AI era, code is cheap. Ambiguity and a bad plan are a lot more expensive.

AI commoditizes output generation. It does not commoditize taste, judgment, or accountability. Our leverage is shifting from code to the quality of the harness we build around the systems we shape. An emphasis on quality of outcomes must be at the apex of our value hierarchy.

Cross-section illustration of a telescope

​

If we align our priorities with this way of thinking, speed will show up as a byproduct, not because we demanded it. Not because we pushed harder on execution, but because we built a system—technical and organizational—designed for validated learning, adaptation, and catching drift before it compounds.

​All of that is a much higher bar than code generation. Good.

​We've always been at our best when the bar is high.

Related Insights

🔗
Speed is a side effect of making the system work
🔗
A plea for lean software
🔗
Anyone can code: Software Is having its Ratatouille moment

Explore our insights

See all insights
Leadership
Leadership
Leadership
Why we're not chasing the AI hype (And what we're doing instead)

We want clients and prospective clients to know they can entrust us to solve problems with AI while remaining true to who we are, how we work, and the value we actually provide.

by
Todd Kaufman
Leadership
Leadership
Leadership
Speed is a side effect of making the system work

When leaders demand speed, teams often cut corners on validation and composition—ironically creating the brittleness that slows everything down. Real speed emerges from flow, small batches, and systems designed for adaptation, not from pushing harder on execution.

by
Doc Norton
Developers
Developers
Developers
IndyPy Talk: Pydantically perfect in every way

In this IndyPy talk, Kyle Adams helps you learn advanced Pydantic techniques to bring order to chaotic, real world data.

by
Kyle Adams
Letter art spelling out NEAT

Join the conversation

Technology is a means to an end: answers to very human questions. That’s why we created a community for developers and product managers.

Explore the community
Test Double Executive Leadership Team

Learn about our team

Like what we have to say about building great software and great teams?

Get to know us
Test Double company logo
Improving the way the world builds software.
What we do
Services OverviewSoftware DeliveryProduct StrategyLegacy ModernizationPragmatic AIDevOpsUpgrade RailsTechnical RecruitmentAssessments
Who WE ARE
About UsCulture & CareersGreat CausesEDIOur TeamContact UsNews & AwardsN.E.A.T.
Resources
Case StudiesAll InsightsLeadership InsightsDeveloper InsightsProduct InsightsPairing & Office Hours
NEWSLETTER
Sign up hear about our latest innovations.
Your email has been added!
Oops! Something went wrong while submitting the form.
Standard Ruby badge
614.349.4279hello@testdouble.com
Privacy Policy
© 2020 Test Double. All Rights Reserved.