What fixing a bug looks like
5 min read
Fixing bugs is an integral programming activity. It’s impossible to write code without bugs— you always start with something, test it, and repair whatever is wrong. The experience of working on bugs is vastly different between projects you do on your own and commercial projects at work. In this article, I demonstrate how I’m approaching new bugs in my job.
Where the bugs are coming from
Most of the bugs that get to me come from my colleagues who work closely with customers, in a service desk or customer success role. They find issues while helping the customer fix a problem or achieve their goals. The solution for some issues is tweaking settings in the application or updating corrupted data in the database: in that case, service desk people resolve those issues. When they see no way of removing the problem themselves, they write a ticket for development. And those tickets eventually arrive at my desk.
Is it even a bug?
Bugs related to business applications can be pretty subtle—to the point that it’s not exactly clear whether a given behavior is correct or not. Sometimes, it depends heavily on context: maybe one client has a workflow where they prefer one behavior, but the current behavior is optimal for other clients.
In most cases, it’s a need of one particular customer that triggers creating a ticket. Because of that, we need to make sure the fix will not make all the other customers unhappy. To avoid such a regression, we often discuss in a bigger group which behavior is needed. If having a different behavior is essential for different customers, maybe we will need to control this part on a per-client basis. In these cases, the bug report becomes a feature request.
Reproducing the bug
When we agree that some behavior is unexpected, I need a way to see how it occurs. Otherwise, I would be changing the code without seeing the impact of my changes. Tweaking code blindly would likely result in introducing even more bugs. The setup we have for the application is a bit complicated: I can reproduce most issues locally, but for some, I need to use an external server.
Reproducing the issue on the developer’s machine is the best situation. This allows for a fast feedback loop while working on the ticket. Most of the system is nicely recreated in isolation to create the local environment in the application I maintain. When I can reproduce the problematic behavior on my local machine, I have a short feedback loop for my changes, and I’m able to iterate quickly on the code and fix the bug without wasting a lot of time.
Unfortunately, our project has some complexity that makes it impossible to locally reproduce some bugs. A few parts of the system are different for the development environment:
- a legacy DB that is only partially available for nonproduction environments, and
- third-party integrations that we turned off for local & test environments—like ordering a package transport on orders.
On the test server
For my job, we have some customer data duplicated for the test environment, and it’s closer to the actual application than the local setup. If a bug doesn’t happen locally, then the test server is the next place to check. Doing it this way is far from perfect; any fixes will need to go a long way before I can test them in action.
Testing on a dedicated test server is way better than doing so directly on production because if things go very wrong, we can still fix them before the customers are affected.
On the production
The last hope is to see the error with my own eyes. If the bug is particular to the customer, it may depend on the exact combination of settings in their account or their data. Reproducing bugs on the product helps maintain sanity: it can confirm that the bug report was not mistaken. But using production as a test area for even the most minor code changes is a bad idea—it’s slow, and if I mess something up, the customers will be affected.
Returning the ticket to its author
If I cannot reproduce the bug myself, there isn’t much I can do about it. In this situation, I request the ticket reporter to provide more information, and sometimes I even get them on a call with screen sharing so we don’t spend too much time getting in sync.
Many things can be causing the bug.
- Maybe the ticket missed some crucial details about the scenario when the bug is supposed to appear, or
- maybe I’m missing some detail, or
- maybe it was just a glitch
Another possibility is that someone fixed the issue in the meantime since the original bug report.
Finding the source in code
After discussing expected behaviors and seeing the error in action, I’m ready to go through the code and investigate its workings. Depending on code quality, this task can be pretty easy in a well-organized project or quite complicated if we have a ‘spaghetti code’ situation.
Fixing + adding tests
The final step is to change the behavior and test the change. As an absolute minimum, I need to see that my bug is gone from the application, so I repeat the reproduction steps and see if the result is better than before.
If the application behaves as it should, then the next step is to add some automated tests. The most straightforward approach is to add a few unit tests to the aspect you have just changed. Complete coverage involves checking the user experience changes with an End-to-End (E2E) test. You can read more about various aspects of testing in my other posts: