How To Refactor Legacy Code
The art of making your code testable, testing it effectively, and refactoring it like a pro
50% OFF - The Complete TDD course
Are you ready to master Clean Code, Testing and Test-Driven Development (TDD)?
I recently launched a complete TDD course containing everything you need to craft high-quality software.
Now there is a 50% OFF for the course
Get instant access by clicking here.
Motivation
“Legacy code is simply code without tests.” - Michael C. Feathers
Most legacy code isn’t bad code. It’s just untested. Untested code is scary. You can’t change it with confidence. You fear breaking things. It slows down development.
But there’s a way out of this trap. A proven strategy I’ve used in both startups and corporate environments to turn messy legacy code into clean, testable, and maintainable systems.
Here’s my 6-step approach to refactoring legacy code;
The Ultimate Refactoring Strategy
Break Dependencies
Characterization testing
Approval testing
Property-based testing
Functional testing to reach 100% coverage
Refactor the code
Step 1: Break Dependencies
Legacy code is hard to test. Your first goal isn’t to add tests everywhere.
Your first goal is to make your code testable. Dependencies are the #1 reason code is hard to test. External calls like APIs, databases, or message queues make testing hard.
The first step is to look for places in the code where you can change behavior without changing structure. Look at this code:
It has the FileReader
dependency hardly coupled. Our goal is to break this dependency and use test doubles - mocks, stubs, fakes - with dependency injection:
Your goal should be to break all the dependencies in your legacy code, with two important notes:
If the dependency is fast → use the real one, it leads to more realistic tests
If the dependency is essential for testing business logic → use the real one, it leads to more meaningful tests
Step 2: Characterization Testing
Before refactoring, you need to understand legacy code. The goal of characterization testing is not about finding bugs. The goal is to understand behaviors.
Use characterization tests when:
The code is too complex to reason about
There is no documentation
There are no existing tests
Let’s say you have this code snippet you don’t know anything about:
The best way to figure out is to write characterization tests around it.
Steps:
Write a test that calls the legacy code
Add an assertion you think should fail.
Execute test to observe behavior
It will likely result in an error like:
Expected formattedText to be <null>, but found "plain text".
Update the test to capture the behavior
Repeat this cycle with new tests until you fully understand the code.
By doing so you both learn about the business logic and document behaviours with automated tests.
Step 3: Approval testing
Writing assertions for complex objects is painful. Approval testing makes it easier. Instead of checking every field manually, you capture the full output once, then compare future runs against it.
How it works:
Generate output from your code
Approve it as correct (store it in a file)
On future runs → compare new output vs approved
Example of a normal unit test:
Problems with this test:
Hard to maintain
If data structure changes → 5 places to update.
Tiresome to write assertions for large data structure
Here is the same test with an approval test:
Just one line. It asserts the whole data structure in a text format. On the first run, it generates a file like this:
ExportToXml_Should_Work.received.txt
When you approve it manually, then it becomes:
ExportToXml_Should_Work.approved.txt
Then in any future run, you just compare the new received vs approved. If there is a change, you probably broke some functionality.
This practice works perfectly for outputs like JSON, HTML, or text reports. You can find approval testing libraries in almost any programming language, check them out!
⚠️ A word of caution
Approval tests are temporary tools. They often lead to fragile tests. They slow you down in the long run. Once your code is clean, refactored, and covered by solid functional tests, delete most of them without regret.
Step 4: Property-based testing
This is my favorite type of testing. Why? Because bugs don’t hide in the happy paths. They hide in the edge and corner cases.
Property-based testing helps you generate a large number of test cases with random inputs, then verifies that certain behaviors always hold true no matter what.
It’s super handy because it captures the key behaviors of your app - the things that should never break when you refactor.
To learn more about property testing, I wrote a full article on this topic, click here to read more.
Step 5: Go for 100% coverage
Aim for ~100% code and behavior coverage. Why so strict? Because anything less leaves room for bugs when you refactor your code. You want maximum confidence. You want a test suite you fully trust. Sure, 100% coverage is almost impossible. But your goal should be to get as close to that as possible.
Use tools like:
Code Coverage → shows what lines of code are uncovered
Mutation Testing → shows what behaviors are untested (even if your code is covered)
Run these tools iteratively and keep adding functional tests until you maximize the coverage results. To learn more about Mutation Testing, check out my recent article about it.
Step 6: Refactor with Confidence
Now the fun part: With a solid test suite in place, you can confidently refactor your code.
Here are 6 tips you can use to refactor your code:
Do aggressive refactoring at small scales
Master the refactoring hotkeys of your IDE
Turn comments into well-named components
Follow the Rule of Three to remove duplications
Don't mix refactoring with changing behavior
Use TDD to make refactoring a core part of development
Conclusion
Refactoring legacy code isn’t hard. It just takes the right techniques and a bit of risk management to do it safely.
If you want to learn more about Mutation Testing and Test-Driven Development, check out my recently launched complete TDD course, which includes:
The fundamentals of Test-Driven Development
Three real-world TDD examples in C#, TypeScript and Rust
The power of Mutation Testing
Using TDD to design high-quality software
Testing legacy code
Refactoring best practices
I’ve struggled with legacy systems where jumping into refactoring too soon caused more harm than good.
One idea: sometimes, it helps to sketch a “future state” design before refactoring to stay grounded in the direction you're taking.
Thanks for the solid strategy, Daniel.