Introduction: Finding your path forward at a legacy crossroads
In the world of software development, there are certain challenges that every team will discuss at some point or another. One of those is how to maintain legacy codebases. There are traditionally two approaches discussed: refactor or rewrite.
If you’re newer to considering what’s involved in the legacy application rewrite vs refactor decision making process, this post is for you.
As so often is the case, there is no one-size-fits-all solution. It’s important to consider what makes the codebase legacy:
- Does it run on outdated software?
- Is the code messy and difficult to maintain?
- Does it have dependencies where updates and security patches are no longer available?
- Are business rules and/or architectural knowledge locked in the head of a very limited number of people?
Defining legacy in the context of your application is a critical first step in deciding how to handle it going forward.
Before we continue, let's quickly define what legacy is not: a codebase is not a legacy codebase simply because it existed before a given individual joined the team. It’s all too common for new developers or managers who do not fully understand an existing codebase to fall into the mindset that it's time for something newer and better.
When you have aligned as a team that you are addressing legacy software problems, you may soon realize that many of the pros and cons are situational. Following are three examples of different paths taken for different reasons.
Field Report #1: Incremental refactor
Incremental refactoring may not be nearly as exciting as starting fresh but it can be remarkably effective.
One of the most challenging codebases I ever worked on was written entirely by two contractors who were no longer part of the company. The software worked and the company had very successfully landed dozens of paying clients. It didn’t take long for feature requests and bug fixes to start rolling in. My team was brought on to maintain the product.
Almost immediately we knew this was going to be a challenge. Documentation was virtually non-existent, there were no tests, and sensitive credentials were hard-coded into the source code. It seemed like the prime opportunity for a full rewrite. But spoiler alert; it wasn’t.
Gaining perspective
Rewrites are expensive in terms of both time and money. Even when the team doing the rewrite is the same team who wrote the original code, features are bound to be missed and new bugs introduced. While fresh perspectives are not inherently bad, the lack of historic context creates a unique challenge.
After some discussion, we came up with the following questions to help guide our approach.
- What are the core issues with the software?
- Do any of the issues pose a security risk?
- What are the risks associated with a full rewrite?
- Do we understand the features of the software?
- Do we understand how the software is used?
Let's break these down a bit more.
What are the core issues with the software?
The first step is to uncover what’s really going on within the legacy software application. Get real visibility around the characteristics of the system that can help you identify the best path forward:
- Is the codebase fairly uniform or is there clear evidence of different eras in its development life cycle?
- Is security treated as a top priority or an afterthought?
- Are there clear architectural boundaries or seams that could allow for a strangler fig pattern?
In our case, the security and architectural designs were both significantly lacking. We’ll talk about security more in the next section, so let's examine the architectural boundaries issue.
Solid architectural boundaries are important because they help us write readable, sustainable code. Something we noticed immediately while reviewing this codebase was there was no separation of concerns; just heaps of PHP files that did everything. Database handling, business logic, and presentation logic were all crammed into massive files, where each file was a separate page. This made for an easy first refactor - separating out concerns into their proper places.
Do any of the issues pose a security risk?
On the topic of long outdated frameworks, identifying any security risks your codebase has is vital for determining your path forward. Sometimes old libraries can be replaced with new ones, and sometimes your entire project is so tied to a framework that you can’t easily migrate within the same codebase.
The biggest security risk we identified was the lack of a modern authentication layer. Instead the original developers had cobbled together their own. Passwords were base64-encoded before being stored in the database, user inputs were not sanitized, and database queries were using these unsanitized user inputs directly. In fact, the first thing we did was test an SQL injection attack on a dev copy of the software and unsurprisingly, we were able to dump the entire database.
Once we had separated the main concerns out of the page files, this was the second major refactor. Updating the newly refactored authentication layer to something more secure.
What are the risks associated with a full rewrite?
Risk assessment is a key factor when determining if a codebase should be rewritten or refactored. Both methods have their own associated risks and you should determine early what your appetite for risk is. What happens if you choose to rewrite and go over your estimates? Can you solve the core issues you’ve identified without a full rewrite?
In our case, the previously identified security concerns meant we had risk in leaving the application as it was while we did a rewrite. This was a risk we were not willing to accept. Many of the core issues we had identified could be solved more quickly by refactoring.
Do we understand the features of the software?
Rewriting software is tedious and difficult even when you fully understand what it is you’re recreating. Attempting a rewrite while also learning what the original software does is a massive challenge.
We had a high level understanding of the feature set, but the lack of documentation and tests meant there were a lot of unknowns. It can be difficult to get a complete feature set from word of mouth alone. We decided that it would be safer to start refactoring the features we did understand than attempt to scope a rewrite with many unknowns.
Do we understand how the software is used?
Understanding the features of a given software and understanding how users interact with it are two very different things. It’s not unusual for developers to be far removed from the user experience and see things as “bugs” that users might see as “features”.
Being new developers on this project, we had no idea how the users interacted with it and expected things to work. We didn't know their workflows, which features they relied on most, or what quirks they had come to depend on. This is knowledge we'd have to gain over time.
Since we didn’t know what we didn’t know, we opted to do a few lunch-and-learn sessions with experienced users to help develop our own understanding and get closer to the product in general. If possible, this is a great method for helping developers understand their user base.
Timeboxed spike
At this point we were leaning toward refactoring, but wanted to validate our thinking. To help finalize our decision, we decided to spike out some refactoring work to get an understanding of what it would look like. This ends up being a win-win scenario because even if it turns out that you’re better off with a rewrite. Refactoring can be done in smaller chunks and so you’re not “wasting” a ton of time and effort.
In our case, the first refactor we timeboxed was the authentication layer. This was a core issue and a major security concern, so it made sense to tackle first. It also helped build trust with our client as we could clearly show our impact on the product early on.
The result
In the end, the team did decide to handle this codebase with a series of refactors going forward. This doesn’t mean the project was perfect, but it improved drastically and gave everyone a chance to understand it more completely.
Field Report #2: A fresh start
Software development, and in particular web development, is an ever changing ecosystem. Frameworks come and go at lightning speeds. Sometimes software engineers will refer to a framework or library as “dead” when what they really mean is “fallen out of popularity.”
But sometimes dead means dead. This could mean the maintainers announced discontinued support or the software has been quietly abandoned. In either case, software that is not regularly maintained can often become a security vulnerability and should be avoided.
Some years back, the development team I was on found ourselves facing this harsh reality. The open source web framework we were using announced it would be ceasing all development. No security updates; no bug fixes. We could, of course, fork and maintain our own version; realistically the amount of work that would require was not an option.
So the question became: Do we rewrite our application from scratch or do we try to refactor it by slowly migrating components into a new framework?
What are the core issues with the software?
The framework the application was built around was discontinued.
Do any of the issues pose a security risk?
Yes, the now-discontinued framework would not be receiving security updates and would only work with older browsers.
Do we understand the features of the software?
Yes, the software was well understood and documented.
Do we understand how the software is used?
Yes, in this particular case, we were the main consumers of the software and knew exactly how the software was being used.
What are the risks associated with a full rewrite?
Existing documentation, unit tests, and platform tools also need to be rewritten. Even though the team doing the rewrite handled the original software, there’s always the risk of missed features. It’s also extremely difficult to estimate a software rewrite.
Spike it out
Before settling on a rewrite, we spiked a few timeboxed refactoring efforts just to get a sense of what that direction would take. This helped reinforce our initial thoughts; any refactors would be akin to a partial rewrite and would likely leave us with a half-baked product. Even though the spike work was tossed out, it was important work that helped us find our path forward.
In retrospect
So, was the rewrite worth it?
In our case, the answer was yes but it certainly came with its own set of issues. As mentioned before, estimating software is difficult and estimating a full rewrite is many times more complicated. Our initial estimates fell short as we discovered forgotten quirks and use cases within the legacy code, learned a new framework, and updated our knowledge base and test suites.
Field Report #3: Full rewrite and team change
The next example involved three other double agents, who helped make possible both a rewrite and team shift. And it brought up additional questions that emerge when a full rewrite is what makes sense for the situation.
What are the core issues with the software?
Cars.com opted to retool their entire platform and make related team changes as part of a massive legacy modernization effort to enable a next generation of products. Test Double embedded a team of consultants to build a new future for the platform with Elixir, Phoenix, and teamwork.
When circumstances do point to the need for a full rewrite, it can be an opportunity to look at everything fresh. That means a different set of questions.
What languages, frameworks, and ways of working are best suited to the complex needs of the business?
Java was a smart choice at the time the legacy codebase was developed, but it wasn’t suited to what Cars needed to enable modern product development to keep its competitive edge. Elixir and Phoenix were chosen to suit the complex data needs focused on making it easier for consumers to find the right car options quickly.
Transitioning from Java to Elixir meant a radical tech stack and skills shift. This legacy systems overhaul also encompassed improving development processes, transitioning to cloud infrastructure, and adopting new methodologies like Shape Up.
How do you invest in engineering team capabilities to support a complex codebase in an entirely new language?
With new technology comes the need to align team experience and knowledge. Focused work to train and level up a combination of existing engineers and help onboard new engineers supported the team in making the transition to new skill sets and new ways of working.
How do you manage changing both software systems and development processes?
Cars opted to introduce Shape Up to improve focused collaboration and software feature delivery. It models a kind of TDD for project management and is based on 6-week cycles.
In retrospect
As members of a combined team, we arrived at a simple architecture, crafted a robust test suite, and designed code to be maintainable over the long-term. Success requires more than settling on the right syntax, it requires people to work well together. This is especially true when there’s transformational change going on with the team and the tech all at the same time. When considering a rewrite, don’t forget about team structure, workflows, and processes.
Conclusion: Legacy app lessons learned
After looking at a few legacy modernization examples, what can you learn from them and how can you apply that to your own legacy system situation?
There is no magic answer
Both refactoring and rewriting are valid solutions under different circumstances. Sometimes a path in between also has merit. It’s worth mentioning again: software is complicated. Your mileage may vary because so much of the tradeoffs are going to be situational.
While incremental refactoring is generally a safer route, it can be an arduous undertaking. Systems with well defined and maintained boundaries and separation of concerns lend themselves more towards refactors. On the other hand, systems with little to no visible seams may suffer from “where to start” syndrome. These may be better candidates for a rewrite.
Explore both options
Even in scenarios where you’re pretty sure a rewrite is the correct path, timebox a few refactoring tickets and see where you get. You might find that a rewrite is unnecessary or you might develop an even stronger case for doing the rewrite. It’s much easier to try a few smaller refactors than it is to rebuild an entire application and scrap it later.
Plan for significant time investment
The reality is that both options require a significant amount of time investment. Whichever option you choose, communicate clearly the expected timelines and deliverables.
Clearly define your goals
A common pitfall when rewriting an application is scope creep. For a rewrite to be successful, you need to clearly define your feature set and success criteria. It's OK if things change during the course of the project as long as everyone involved agrees to accept the tradeoffs and risks that are associated with them. Incremental refactors are less likely to be plagued with this issue but it's still important to clearly define your end goals.
Don't let perfect be the enemy of good
Dealing with a legacy codebase can be daunting. Whether you're refactoring or rewriting, it's easy to get caught up trying to create the perfect solution. Some developers worry that if they don't "get it right" this time, they'll just end up back in the same position down the road.
But here's the reality: software is an ever evolving landscape.
The code you write today will likely one day be someone else's legacy codebase. Planning is certainly an important part of the process, but so is making progress. Focus on improving the current situation rather than trying to make it perfect.
More legacy modernization food for thought
Test Double has helped clients solve legacy modernization challenges from refactoring to full rewrites, and just about every path in between. You might even recognize something similar to your own situation in one of these:
- Migrating legacy Rails codebase while clearing technical and operational blockers with 99% reduction in test time
- Rebuilding React design library for consistency across apps and establishing common patterns for new React front ends—while also refactoring legacy code to improve maintainability and data handling
- Migrating legacy tools to modern platforms alongside improving unit testing for internal interfaces, and streamlining onboarding processes so new engineers can contribute more quickly
- Scaling a legacy platform into a well-tested, fault tolerant umbrella app while refactoring to make performance more observable, highly available and multi-node distributed with smarter data structures
- Building a new control center and integrating unified login while decomposing a Rails monolith into microservices for improved scalability and performance
- Upgrading a dated tech stack by transitioning from Ruby 2.3 to 2.5, updating Rails 2.3 to 5.2+, and rewriting the UI from Angular 1 to React
- Transitioning legacy .NET apps to scalable Node.js framework and enhance maintainability with leading practices
Legacy system not moving fast enough for the business?
We have more resources in the related links below. Not sure where to start? We also offer assessments.
Shawn Rinehart is a Senior Software Consultant at Test Double, and has experience in back end development, frontend development, and API integration.