Yet another misunderstanding of TDD, testing, and code coverage

I was vaguely annoyed to see this blog article featured in JavaLobby’s recent mailout. Not because Kevin Pang doesn’t make some good points about the limits of code coverage, but because his title is needlessly controversial. And, because JavaLobby is engaging in some agile-baiting by publishing it without some editorial restraint.

In asking the question, “Is code coverage all that useful,” he asserts at the beginning of his article that Test Driven Development (TDD) proponents “often tend to push code coverage as a useful metric for gauging how well tested an application is.” This statement is true, but the remainder of the blog post takes apart code coverage as a valid “one true metric,” a claim that TDD proponents don’t make, except in Kevin’s interpretation.

He further asserts that “100% code coverage has long been the ultimate goal of testing fanatics.” This isn’t true. High code coverage is a desired attribute of a well tested system, but the goal is to have a fully and sufficiently tested system. Code coverage is indicative, but not proof, of a well-tested system. How do I mean that? Any system whose authors have taken the time to sufficiently test it such that it gets > 95% code coverage is likely (in my experience) thinking through how to test their system in order to fully express its happy paths, edge cases, etc. However, the code coverage here is a symptom, not a cause, of a well-tested system. And the metric can be gamed. Actually, when imposed as a management quality criterion, it usually is gamed. Good metrics should confirm a result obtained by other means, or provide leading indicators. Few numeric measurements are subtle enough to really drive system development.

Having said that, I have used code-coverage in this way, but in context, as I’ll mention later in this post.

Kevin provides example code similar to the following:

String foo(boolean condition) {
    if (condition)
        return "true";
    else
        return "false";
}

… and talks about how if the unit tests are only testing the true path, then this is only working on 50% coverage. Good so far. But then he goes on to express that “code coverage only tells us what was executed by our unit tests, not what executed correctly.” He is carefully telling us that a unit test executing a line doesn’t guarantee that the line is working as intended. Um… that’s obvious. And if the tests didn’t pass correctly, then the line should not be considered covered. It seems there are some unclear assumptions on how testing needs to work, so let me get some assertions out of the way…

  1. Code coverage is only meaningful in the context of well-written tests. It doesn’t save you from crappy tests.
  2. Code coverage should only be measured on a line/branch if the covering tests are passing.
  3. Code coverage suggests insufficiency, but doesn’t guarantee sufficiency.
  4. Test-driven code will likely have the symptom of nearly perfect coverage.
  5. Test-driven code will be sufficiently tested, because the author wrote all the tests that form, in full, the requirements/spec of that code.
  6. Perfectly covered code will not necessarily be sufficiently tested.

What I’m driving at is that Kevin is arguing against something entirely different than that which TDD proponents argue. He’s arguing against a common misunderstanding of how TDD works. On point 1 he and I are in agreement. Many of his commentators mention #3 (and he states it in various ways himself). His description of what code coverage doesn’t give you is absurd when you take #2 into account (we assume that a line of covered code is only covered if the covering test is passing). But most importantly – “TDD proponents” would, in my experience, find this whole line of explanation rather irrelevant, as it is an argument against code-coverage as a single metric for code quality, and they would attempt to achieve code quality through thoroughness of testing by driving the development through tests. TDD is a design methodology, not a testing methodology. You just get tests as side-effect artifacts of the approach. Useful in their own right? Sure, but it’s only sort of the point. It isn’t just writing the tests-first.

In other words – TDD implies high or perfect coverage. But the inverse is not necessarily true.

How do you achieve thoroughness by driving your development with tests? You imagine the functionality you need next (your next increment of useful change), and you write or modify your tests to “require” the new piece of functionality. They you write it, then you go green. Code coverage doesn’t enter into it, because you should have near perfect coverage at all times by implication, because every new piece of functionality you develop is preceded by tests which test its main paths and error states, upper and lower bounds, etc. Code coverage in this model is a great way to notice that you screwed up and missed something, but nothing else.

So, is code-coverage useful? Heck yeah! I’ve used coverage to discover lots of waste in my system. I’ve removed whole sets of APIs that were “just in case I need them” APIs, because they become rote (lots of accessors/mutators that are not called in normal operations). Is code coverage the only way I would find them? No. If I’m dealing with a system that wasn’t driven with tests, or was poorly tested in general, I may use coverage as a quick health meter, but probably not. Going from zero to 90% on legacy code is likely to be less valuable than just re-writing whole subsystems using TDD… and often more expensive.

Regardless, while Kevin is formally asking “is code coverage useful?” he’s really asking (rhetorically) is it reasonable to worship code coverage as the primary metric. But if no one’s asserting the positive, why is he questioning it? He may be dealing with a lot of people with misunderstandings of how TDD works. He could be dealing with metrics bigots. He could be dealing with management-imposed-metrics initiatives which often fail. It might be a pet peeve or he’s annoyed with TDD and this is a great way to do some agile-baiting of his own. I don’t know him, so I can’t say. His comments seem reasonable, so I assume no ill intent. But the answer to his rhetorical question is “yes, but in context.” Not surprising, since most rhetorically asked questions are answerable in this fashion. Hopefully it’s a bit clearer where it’s useful (and where/how) it’s not.

(This article is a cross-post from “Geek in a Suit”)

Try out our Virtual Scrum Coach with the Scrum Team Assessment tool - just $500 for a team to get targeted advice and great how-to information

Please share!
facebooktwittergoogle_plusredditpinterestlinkedinmailfacebooktwittergoogle_plusredditpinterestlinkedinmail

5 thoughts on “Yet another misunderstanding of TDD, testing, and code coverage

  1. I think he’s un-necessarily stirring the pot. I don’t think any sane person would assume that 100% code coverage is the silver bullet.

    Code coverage is useful and a great metric, but it must be taken in the context of your entire development process. CI, functional, system tests and code reviewing your code AND tests make sure that the coverage reports are more meaningful.

  2. I think test coverage metrics, and techniques like operator mutation (change a random operator in the code, retest — do any tests break? rinse, repeat), are a good way to check whether your user stories and test cases are broad enough.

    But one shouldn’t simply compose a test explicitly to cause a branch of code to be executed and assume that branch as coded is correct; rather you should consider what conditions would lead to that branch, and go back to your user and requirements process, and discuss what should happen under those conditions; then build test cases for those conditions, and then change your code as needed to pass that new test.

    Also, branch coverage isn’t as strong as it might be because branches may be left out entirely. If the code looks like if(a)..elsif(b)…elsif(c) and there’s no (!a&!b&!c) branch at the end, that may be an un-covered case that needs to be handled, but a branch coverage analysis won’t see it ’cause the branch isn’t there at all.

    So branch coverage isn’t perfect, but if properly used can be used to discover missing user stories and corresponding test cases. If it is used as an end in itself, causing people to write random looking tests that happen to pass just to make the coverage numbers look better, then we have a problem.
    It seems that Kevin is expecting this latter case; which would obviously be bad.

  3. “Any system whose authors have taken the time to sufficiently test it such that it gets > 95% code coverage is likely (in my experience) thinking through how to test their system in order to fully express its happy paths, edge cases, etc.”

    Well said. We generally have code coverage in excess of 95% on our apps, not because people are focusing on the number but because our culture (and hard lessons learned from deploying apps with lots of bad bugs and a poor bug process) has encouraged testing as a path to building a better system, saving us grief in the long wrong, helping push the developer to thing through the problem first by writing the tests and thus validate the story (or not) and expose challenges with the task in front of them before diving into the implementation.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>