Mistakes are inevitable: The road to less mistakes
There's a pervasive assumption in the software industry that truly expert developers are machines that make no mistakes and only write bug-free code.
It'd be so convenient if that were true. We'd be able to deliver value at a steady pace, avoid high-intensity production incidents and focus on the fun parts of the work: building things.
At the end of the day, however, developers are human, and humans make mistakes. There's no magical threshold of expertise that lets developers transcend human nature. That's why any strategy based on the assumption of not making mistakes is flawed.
Even with the rise of agentic coding, a recurring theme is that they’ll hallucinate and require human review. It turns out even machines make mistakes, so why did we put that expectation on developers in the first place?
Still, I've seen too many decisions made with this underlying assumption to be comfortable with it. It's not limited to leadership either. Developers do it to themselves all the time by saying that they'll have to be better in the future, increasing their burden without any strategy to alleviate the pressure.
I know I did and that's something I still struggle with.
If we can't avoid humans making mistakes, we do the next best possible thing: managing mistakes. We have to zoom out of individual contributors to the larger software-production system with its tools, processes and practices.
How this assumption manifests
The presumption that expert developers don't make mistakes rarely manifests on its own. It tends to show up in decision-making if we look for it:
- After an incident occurs in production, developers might say that they've learned from it and they won't make the mistake again.
- We might receive a memo that the defect rate is getting higher and we have to be more careful about quality.
- The upcoming roadmap might be packed to the brim with feature work needed for the deadline without any slack time to handle rework.
These decisions shift the burden to the developers and assume that it'll work out somehow. Of course, it does... until the inevitable mistakes happen again.
Why it doesn't work
There's only so many options available when our premise is essentially that we won't have mistakes. Either no mistakes will be made and we can breathe a sigh of relief or mistakes happen and we have to adjust course quickly.
We also wouldn’t have a lot of safety nets in place for the sudden course corrections. This can lead to painful choices having to be made. Here’s some examples I’ve personally seen in my career:
- Rolling back the database or cleaning up corrupted data
- Taking the checkout process offline or accepting invalid orders
- Falling weeks behind in the roadmap or living with broken features
High-intensity events also come with their own load of pressure and stress, making it likely that new mistakes will be made while trying to fix the previous ones.
Changing our approach
Assuming we can put the burden of making less mistakes on developers is easy, but we need something different if we want to have more consistent outcomes.
Switching to managing mistakes opens up a world of possibilities for catching them earlier, minimizing their impact and learning from them. Rather than trying them all at once, take the time to find the few solutions that best fit your context.
Finding the right solutions
Lean on those closest to the work. They're the ones with the most context of what's happening and they can probably surface pain points that we're not aware of. They're also the ones with the most to gain out of any measure we put in place and we'll want their support down the line.
A great place to start could be scripting and automation because it moves the burden away from humans towards computers, which are much less prone to having bad days.
Solutions could also be found on the human side. Zoom out some more and look at the whole system of how software is built. How do the different teams and tools interact together? Are there too many involved at times? Is there often a disconnect between them?
Provide the time and space
Part of the appeal of assuming developers won't make any more mistakes is that everyone goes back to delivering value quicker. If that sounds familiar, it's possible that the team doesn't feel comfortable walking off the delivery treadmill long enough to work on meaningful safety nets.
Even if it's an investment that will provide dividends down the road, meaningful measures take time and effort. Find ways to highlight these benefits for the business like building a business case for it or writing a quick prototype to make it easier to grasp.
Celebrate prevention
We can be deliberate in nudging the team's culture in the right direction as well. Celebrate the small steps that the team makes in managing mistakes. Highlight the benefits that this will give everyone down the road. This can require proactivity on our part because preventing mistakes will often be a non-event.
It's important to stress that introducing static code analysis, using feature flags or having smooth collaboration with stakeholders is great work just as much as everything else. That will be pivotal in sustaining ongoing collective efforts.
There's no road to zero mistakes
Of course, there's no silver bullet here and we'll always end up with mistakes. The only thing we can do is navigate the tradeoffs of different solutions to get the best value for our efforts.
Acknowledging that our developers are only human gives us the best way of moving forward and maturing as an industry. There's no road to zero mistakes, but there's definitely a road to fewer mistakes.