How I Broke Production (and What I Learned)
How to stop relying on luck and start building with clarity
I’m delighted to share this guest post from my friend, . We went live together on LinkedIn earlier this year to chat about avoiding common engineering pitfalls. We also talked about his brilliant book, Keep Calm And Code On. I’m so glad he wrote this guest post about a popular pitfall — development by hope.
I’ve developed by hope countless times in my career. We’ve all been there. You write some code, ship it, and pray it works. While this might work sometimes, it’s not a good habit to form, nor does it make you a good teammate. There will be times when you need to dive deep into something to learn it or fix it. And as painful as that might be, it’s worth it. Read on to learn more about development by hope and how to avoid it.
Over to you, Alex.
A few years back, I was in charge of fixing a bug that was causing one of the main pages in our application to crash. My approach could only be described as frantic. The code itself was in a part of the codebase I wasn’t too familiar with, and I had a proposed change that I consulted with a colleague on.
Since this was impacting customers, I was in a rush to push the deploy button as fast as possible. When my coworker who knew the code better suggested that I write a test to verify my change, I exclaimed, “It’s a hotfix, I don’t have time for tests!”
He paused for a second, then solemnly replied, “Those are the times you need tests the most.” His response took me by surprise, and his words still echo in my head years later whenever I’m addressing a hotfix.
I’ve since come to refer to this style of approaching a problem as “Development by Hope.”
Development by Hope occurs when you throw out a solution that you’re not particularly confident in and instead rely on having enough luck on your side to make it over the finish line. It’s taking the phrase “I’d rather be lucky than good” and quite literally applying it to software.
I’ve seen it time after time, and I myself have done it more times than I’d care to admit.
Hard-to-Reproduce Bugs
The underlying cause for Development by Hope is hard-to-reproduce issues. In my experience, these issues come in 2 flavors:
A race condition makes the issue appear intermittently.
For example, a webpage’s visual bug that only appears intermittently upon browser refreshes.
An interaction occurs in a hard-to-access area of the codebase.
For example, a bug that happens on the last step of a 10-step modal wizard. Both the data needed to set up that flow as well as clicking through could be painful to set up.
If you’re unlucky, a problem will fall into both categories at once. Those are the times that are most trying even for seasoned developers.
The problem with relying solely on hope is the same problem with tech debt in general: you’ll likely be able to do this for a while, but once that debt reaches a tipping point, there are all kinds of problems in the system. Not to mention that in the case of hotfixes in particular, I’ve found that haste often begets the need for another one right after.
Annoyance as a Code Smell
Before succumbing to how hard a problem is, I’ve found that flipping the script and asking myself “What can I do to be confident in a fix?” to be an incredibly effective technique. In my opinion, this is one of the most effective ways to jump right to the underlying pain point.
Let’s re-examine the examples of hard-to-reproduce issues from before and look at possible ways to deduce their underlying pain:
A webpage’s visual bug that only appears intermittently upon browser refreshes.
When seemingly random behavior arises, the only real way to be confident in fixing it is through understanding the lifecycle of an action. Being able to effectively debug and utilize observability tools are skills that are indispensable for this sort of task. If you can’t explain it, you can’t be sure you’ve solved it.
A bug that happens on the last step of a 10-step modal wizard.
This sort of issue is a real test of a feature’s architecture. Confidence here means being able to run this 10th step with a variety of input along the way. You’ll need the aid of short feedback loops which may also require mocking out data and other development environment niceties.
In either case, the initial feeling I’ve had when encountering both of these types of problems is annoyance. Annoyance that, for one reason or another, I’m not able to have my cake and eat it too when it comes to shipping high-quality code quickly.
But now I recognize that annoyance is a code smell that points to a chance to make things better in your system. Maybe it’s adding tools to increase observability, or maybe it’s expanding a test suite. Sometimes it’s as simple as taking the extra 5 minutes to write a test when you’re in the fervor of a hotfix.
Development by Hope doesn’t build robust systems. Confidence does. Chase the pain, fix the system, and leave the luck to the lottery.
Thanks to for writing about this common engineering pitfall. Please read his book and follow him on LinkedIn.