Forget code coverage! Use Mutation Testing
What is Mutation Testing, why you should use it daily
Why you need it?
It’s 2023 and I still meet a lot of teams that heavily rely on code coverage. They use it in their pipeline and aim for a coverage goal of 90-100%. When they reach it, they're happy to deploy to PROD.
The problem is that code coverage is not a quality metric. You can have 100% coverage without any test assertions. To demonstrate:
Just imagine you have a code with 100% code coverage. Nice!
Now imagine deleting all the assertion statements in your tests.
Run your code coverage tool again.
The result? 100% code coverage.
Do you still trust your tests? Code coverage can help to detect untested areas of the code, but it doesn't tell anything about the tested areas. It gives a false sense of security, giving no information about the quality of your tests. How can we measure the quality of our tests then?
The answer is: Mutation Testing
What is Mutation Testing?
Mutation Testing is about creating modified versions of the source code, called mutants, and then seeing if the tests are able to detect them. You make a small change, you run all your tests, then you check:
If you have at least one failing test, your tests cover the behavior.
If all your tests pass, then you have a missing test case or assertion.
We generate a large number of mutations for the entire codebase, and for each one, we check how strong our tests are. The goal is to improve the quality of our test suite by finding missing parts in our tests.
Types of mutation testing
Mutation tests can take various forms, such as:
Operator Mutations (e.g., replacing "+" with "-")
Conditional Mutations (e.g., negating booleans)
Statement mutation (e.g., deleting a line of code)
Value mutation (e.g., altering a function's return value)"
Mutation testing can be done manually, but the most efficient way is to do it in an automated fashion using a library. The good news is that you can find a mutation testing library for most tech stacks.
Here are the most popular ones per language:
How to measure quality
After each mutation, we run all our tests. If we have at least one failing test it means that the mutant is killed. If there are no failing tests, then the mutant is survived. To measure the success of mutation testing, we can use the mutation score. It's a percentage value calculated by dividing the number of killed mutants by the total number of mutants and then multiplying the result by 100.
If all mutants are killed, we have a 100% mutation score, which tells much more about the quality of our test suite, than 100% code coverage.
The ultimate solution
If you use test-last development, mutation testing should be mandatory as a quality assurance tool. However, because there are so many mutations to create, it will require extra effort to analyze the test results.
Solution? Test-Driven Development (TDD). When using TDD, with its test-first approach and the 3rd law, we achieve a code base with 100% code coverage and a 100% mutation score automatically. This is a killing feature of TDD, that not many people talk about.
3rd law of TDD:
You are not allowed to write any more production code than is sufficient to pass the one failing unit test.
After all, we are still humans making mistakes, so even with TDD, we should still use mutation testing to double-check the quality of our work.
Imagine you have a slice of bread with your favorite topping. Code coverage will tell you that its X% is covered with some topping. The mutation score will tell you that your bread is covered with, jam, Nutella, cream cheese, or something else.
When you offer a child a slice of bread, what information are they interested in? The percentage of their bread covered or what it's covered with? Of course, they want to know if it has Nutella on it.
The same happens in software: we should care more about the behaviors our tests cover rather than the number of lines of code they execute.