How to ensure quality and reliability of code in software development? Of course, the answer is simple enough: you test it! Without proper testing, bugs and errors can go unnoticed, leading to crashes, data loss, or even security breaches.
But here’s the catch: measuring the effectiveness of testing can be challenging, and relying on a single metric, such as code coverage, can lead to a false sense of security. While achieving 100% code coverage may seem like the ultimate goal, it's important to understand that it's not enough. 100% Code Coverage does NOT mean high test suite quality. Curious? Interested? Welcome to this article and come, let’s find out what we mean and why.
Many people, especially in management positions, are still convinced that code coverage is the most important measurement of well-tested code. They are still tracking whether developers are writing tests. Instead, developers should combine multiple metrics to evaluate the quality of their tests.
Say hello to the holy trinity
Three important metrics to consider are coverage testing, mutation testing, and Test-Driven Development (TDD), that I like to call “the holy trinity of quality tests”.
Why Code Coverage has its limitations
Code coverage is a metric in software development that measures how much of a program's source code has been executed during automated testing. It is displayed as a percentage of the total lines of code that are covered by one or more test cases. There are different types of coverage testing, such as line coverage, branch coverage, statement coverage or function coverage. Each type measures the extent to which your tests cover different aspects of your code.
“The ability to unit test a piece of code is a nice litmus test, but it only works in one direction. It’s a good negative indicator—it points out poor-quality code with relatively high accuracy.”1
Although code coverage measures how much of your code has been executed by the tests, it does not reveal anything about the quality of those tests. So your test suite achieved 100% code coverage? Cool, but hey, it may still have weak spots that leave the code vulnerable to bugs and errors. For example, it's possible to write tests that cover all the branches in the code but don't test the edge cases or the boundary conditions. Possible, right?
Or sample this: It could also be the tests may not catch the subtle interactions between different parts of the code, leading to unexpected behavior. Or, for the simplest example, your tests can be assertions-free – and you would still have 100% line coverage.
In other words, long story short: code coverage is necessary but not sufficient for well-tested code.
Figure 1: Code with 100% line and 100% branch coverage - but no assertions at all
In the study "The Effect of Code Coverage on Fault Detection Under Different Testing Profiles" by Xia Cai and Michael R. Lyu, the authors conducted an empirical investigation to explore the relationship between code coverage and fault detection effectiveness under different testing profiles. The study found that while higher code coverage is generally associated with better fault detection, there are significant variations in the relationship between code coverage and fault detection effectiveness. These variations could be due to the specific testing profile or due to other factors like the quality of the test suite and the characteristics of the software being tested. This further highlights the importance of using multiple metrics, including code coverage, mutation testing, and TDD, to evaluate the effectiveness of software testing.
The study also found that the relationship between code coverage and fault detection effectiveness varies, depending on the type of faults present in the code. For example, code coverage is more effective in detecting simple faults, while more complex faults may require additional testing techniques.
The Benefits of Coverage Testing
Despite all the pitfalls of relying on code coverage, it still has many benefits that we can make use of.
By analyzing the coverage of your tests, you can identify areas of the code that are not well-tested and improve your test suite accordingly. Coverage testing also helps you avoid writing redundant tests or tests that don't contribute to the quality of your code.
With these points in mind, here’s why code coverage shines as a metric:
- Identifying untested code: Code coverage can help identify areas of code that are yet to be tested. This can help developers prioritize their testing efforts and ensure that all code paths are covered by automated tests.
- Tracking progress: Code coverage can be used as a progress metric, allowing developers to track how much of the code has been tested over time. This can help identify areas where testing efforts need to be increased and provide insight into how the software quality is improving over time.
- Improving maintainability: Code coverage can also improve the maintainability of the software. By having automated tests in place for all code paths, it can be easier to modify or refactor the code without introducing new bugs, because you get the immediate feedback of a failing test.
And that brings us to the overall benefits of Testable Code. If our code is structured in a way that makes it easy to test and cover with a multitude of test cases, we can rest assured that our code quality has improved.
The Importance of TDD
The topic of Testable Code immediately leads us to the topic of Test-Driven Development. TDD is a technique that emphasizes on writing tests before writing the code. With TDD, you start by writing a failing test that specifies the behavior you want to implement. Then, you write the code that makes the test pass, and finally, you refactor the code to improve its design and maintainability.
TDD ensures that your code is testable and that your tests are meaningful. By writing tests before writing the code, you focus on the requirements and the behavior of the code, not just on its implementation. This helps you catch bugs and errors early in the development process, when they're cheaper and easier to fix.
The Power of Mutation Testing
But even after we have ensured that we’ve started implementing our features using TDD, running a coverage tool to find out what we are still missing, we still have this one open question of – what else?
How can I be sure that my tests are actually good?
And this is where Mutation testing comes into the picture. Mutation testing is a powerful technique that complements TDD and coverage testing by identifying weaknesses in your test suite. It works by creating small modifications or "mutations" to the code, such as changing an operator or a variable, and then running your test suite against the mutated code. If your tests pass, it means that they didn't catch the mutation. This suggests that they may not be robust enough to catch other similar bugs in the code. By contrast, if your tests fail, it means that they caught the mutation, which indicates that they are effective at detecting bugs in the code.
Mutation testing is particularly useful for identifying areas of your code that are not well-tested, such as error-prone or complex code. It also helps you prioritize your testing efforts by identifying the mutations that are most critical or most likely to occur in real-world scenarios. However, mutation testing can be computationally expensive and time-consuming, especially for large codebases or complex applications. Therefore, it's important to use mutation testing judiciously, and in combination with other testing techniques.
Order of the metrics
As stated above, the goal for improving the overall quality of my Unit Tests can best be achieved by combining these three metrics and executing them in the following order:
1. We must implement our code and our tests with the help of TDD.
2. Measure your current coverage and check if you accidentally left out any parts of your code.
3. Perform Mutation Testing to catch the edge cases and boundary conditions of the code, since it also requires a green suite. This will ensure that all mutants can be killed.
In conclusion, achieving 100% code coverage is not enough to ensure well-tested code. While code coverage is an important metric, it has its limitations and doesn't guarantee the quality of your tests. To truly ensure that your code is well-tested, you must combine different types of testing techniques, such as TDD, coverage, and mutation testing. While TDD helps you write testable code and meaningful tests, coverage testing helps you measure the effectiveness of your tests and identify weak spots. And finally, mutation testing helps you evaluate the quality of your test suite and prioritize your testing efforts.
Go on then, combine all these techniques, and ensure that your code is reliable and maintainable. And hopefully you shall also remember why just 100% is not necessarily enough!
1. Vladimir Khorikov, Unit Testing Principles, Practices, and Patterns, page 5