What is a reasonable code coverage % for unit tests (and why)? (original) (raw)

This prose by Alberto Savoia answers precisely that question (in a nicely entertaining manner at that!):

http://www.artima.com/forums/flat.jsp?forum=106&thread=204677

Testivus On Test Coverage

Early one morning, a programmer asked the great master:

“I am ready to write some unit tests. What code coverage should I aim for?”

The great master replied:

“Don’t worry about coverage, just write some good tests.”

The programmer smiled, bowed, and left.

...

Later that day, a second programmer asked the same question.

The great master pointed at a pot of boiling water and said:

“How many grains of rice should I put in that pot?”

The programmer, looking puzzled, replied:

“How can I possibly tell you? It depends on how many people you need to feed, how hungry they are, what other food you are serving, how much rice you have available, and so on.”

“Exactly,” said the great master.

The second programmer smiled, bowed, and left.

...

Toward the end of the day, a third programmer came and asked the same question about code coverage.

“Eighty percent and no less!” Replied the master in a stern voice, pounding his fist on the table.

The third programmer smiled, bowed, and left.

...

After this last reply, a young apprentice approached the great master:

“Great master, today I overheard you answer the same question about code coverage with three different answers. Why?”

The great master stood up from his chair:

“Come get some fresh tea with me and let’s talk about it.”

After they filled their cups with smoking hot green tea, the great master began to answer:

“The first programmer is new and just getting started with testing. Right now he has a lot of code and no tests. He has a long way to go; focusing on code coverage at this time would be depressing and quite useless. He’s better off just getting used to writing and running some tests. He can worry about coverage later.”

“The second programmer, on the other hand, is quite experience both at programming and testing. When I replied by asking her how many grains of rice I should put in a pot, I helped her realize that the amount of testing necessary depends on a number of factors, and she knows those factors better than I do – it’s her code after all. There is no single, simple, answer, and she’s smart enough to handle the truth and work with that.”

“I see,” said the young apprentice, “but if there is no single simple answer, then why did you answer the third programmer ‘Eighty percent and no less’?”

The great master laughed so hard and loud that his belly, evidence that he drank more than just green tea, flopped up and down.

“The third programmer wants only simple answers – even when there are no simple answers … and then does not follow them anyway.”

The young apprentice and the grizzled great master finished drinking their tea in contemplative silence.

Dorian's user avatar

Dorian

23.8k9 gold badges127 silver badges116 bronze badges

answered Sep 18, 2008 at 4:30

Jon Limjap's user avatar

Jon LimjapJon Limjap

95.2k15 gold badges103 silver badges153 bronze badges

36

Code Coverage is a misleading metric if 100% coverage is your goal (instead of 100% testing of all features).

So trust yourself or your developers to be thorough and cover every path through their code. Be pragmatic and don't chase the magical 100% coverage. If you TDD your code you should get a 90%+ coverage as a bonus. Use code-coverage to highlight chunks of code you have missed (shouldn't happen if you TDD though.. since you write code only to make a test pass. No code can exist without its partner test. )

answered Sep 18, 2008 at 4:33

Gishu's user avatar

GishuGishu

136k47 gold badges226 silver badges311 bronze badges

9

Jon Limjap makes a good point - there is not a single number that is going to make sense as a standard for every project. There are projects that just don't need such a standard. Where the accepted answer falls short, in my opinion, is in describing how one might make that decision for a given project.

I will take a shot at doing so. I am not an expert in test engineering and would be happy to see a more informed answer.

When to set code coverage requirements

First, why would you want to impose such a standard in the first place? In general, when you want to introduce empirical confidence in your process. What do I mean by "empirical confidence"? Well, the real goal correctness. For most software, we can't possibly know this across all inputs, so we settle for saying that code is well-tested. This is more knowable, but is still a subjective standard: It will always be open to debate whether or not you have met it. Those debates are useful and should occur, but they also expose uncertainty.

Code coverage is an objective measurement: Once you see your coverage report, there is no ambiguity about whether standards have been met are useful. Does it prove correctness? Not at all, but it has a clear relationship to how well-tested the code is, which in turn is our best way to increase confidence in its correctness. Code coverage is a measurable approximation of immeasurable qualities we care about.

Some specific cases where having an empirical standard could add value:

Which metrics to use

Code coverage is not a single metric; there are several different ways of measuring coverage. Which one you might set a standard upon depends on what you're using that standard to satisfy.

I'll use two common metrics as examples of when you might use them to set standards:

There are many other metrics (line coverage is similar to statement coverage, but yields different numeric results for multi-line statements, for instance; conditional coverage and path coverage is similar to branch coverage, but reflect a more detailed view of the possible permutations of program execution you might encounter.)

What percentage to require

Finally, back to the original question: If you set code coverage standards, what should that number be?

Hopefully it's clear at this point that we're talking about an approximation to begin with, so any number we pick is going to be inherently approximate.

Some numbers that one might choose:

I haven't seen numbers below 80% in practice, and have a hard time imagining a case where one would set them. The role of these standards is to increase confidence in correctness, and numbers below 80% aren't particularly confidence-inspiring. (Yes, this is subjective, but again, the idea is to make the subjective choice once when you set the standard, and then use an objective measurement going forward.)

Other notes

The above assumes that correctness is the goal. Code coverage is just information; it may be relevant to other goals. For instance, if you're concerned about maintainability, you probably care about loose coupling, which can be demonstrated by testability, which in turn can be measured (in certain fashions) by code coverage. So your code coverage standard provides an empirical basis for approximating the quality of "maintainability" as well.

isherwood's user avatar

isherwood

60.8k16 gold badges120 silver badges167 bronze badges

answered Jan 9, 2016 at 20:44

killscreen's user avatar

killscreenkillscreen

1,75512 silver badges14 bronze badges

5

Code coverage is great, but functionality coverage is even better. I don't believe in covering every single line I write. But I do believe in writing 100% test coverage of all the functionality I want to provide (even for the extra cool features I came with myself and which were not discussed during the meetings).

I don't care if I would have code which is not covered in tests, but I would care if I would refactor my code and end up having a different behaviour. Therefore, 100% functionality coverage is my only target.

answered Apr 27, 2009 at 22:56

tofi9's user avatar

tofi9tofi9

5,8534 gold badges30 silver badges51 bronze badges

4

My favorite code coverage is 100% with an asterisk. The asterisk comes because I prefer to use tools that allow me to mark certain lines as lines that "don't count". If I have covered 100% of the lines which "count", I am done.

The underlying process is:

  1. I write my tests to exercise all the functionality and edge cases I can think of (usually working from the documentation).
  2. I run the code coverage tools
  3. I examine any lines or paths not covered and any that I consider not important or unreachable (due to defensive programming) I mark as not counting
  4. I write new tests to cover the missing lines and improve the documentation if those edge cases are not mentioned.

This way if I and my collaborators add new code or change the tests in the future, there is a bright line to tell us if we missed something important - the coverage dropped below 100%. However, it also provides the flexibility to deal with different testing priorities.

answered Oct 7, 2014 at 15:58

Eponymous's user avatar

EponymousEponymous

6,6424 gold badges44 silver badges45 bronze badges

2

Many shops don't value tests, so if you are above zero at least there is some appreciation of worth - so arguably non-zero isn't bad as many are still zero.

In the .Net world people often quote 80% as reasonble. But they say this at solution level. I prefer to measure at project level: 30% might be fine for UI project if you've got Selenium, etc or manual tests, 20% for the data layer project might be fine, but 95%+ might be quite achievable for the business rules layer, if not wholly necessary. So the overall coverage may be, say, 60%, but the critical business logic may be much higher.

I've also heard this: aspire to 100% and you'll hit 80%; but aspire to 80% and you'll hit 40%.

Bottom line: Apply the 80:20 rule, and let your app's bug count guide you.

answered Jul 29, 2016 at 23:50

Greg Trevellick's user avatar

Greg TrevellickGreg Trevellick

1,3911 gold badge16 silver badges26 bronze badges

1

For a well designed system, where unit tests have driven the development from the start i would say 85% is a quite low number. Small classes designed to be testable should not be hard to cover better than that.

It's easy to dismiss this question with something like:

True, but there are some important points to be made about code coverage. In my experience this metric is actually quite useful, when used correctly. Having said that, I have not seen all systems and i'm sure there are tons of them where it's hard to see code coverage analysis adding any real value. Code can look so different and the scope of the available test framework can vary.

Also, my reasoning mainly concerns quite short test feedback loops. For the product that I'm developing the shortest feedback loop is quite flexible, covering everything from class tests to inter process signalling. Testing a deliverable sub-product typically takes 5 minutes and for such a short feedback loop it is indeed possible to use the test results (and specifically the code coverage metric that we are looking at here) to reject or accept commits in the repository.

When using the code coverage metric you should not just have a fixed (arbitrary) percentage which must be fulfilled. Doing this does not give you the real benefits of code coverage analysis in my opinion. Instead, define the following metrics:

New code can only be added if we don't go above the LWM and we don't go below the HWM. In other words, code coverage is not allowed to decrease, and new code should be covered. Notice how i say should and not must (explained below).

But doesn't this mean that it will be impossible to clean away old well-tested rubbish that you have no use for anymore? Yes, and that's why you have to be pragmatic about these things. There are situations when the rules have to be broken, but for your typical day-to-day integration my experience it that these metrics are quite useful. They give the following two implications.

And again, if the feedback loop is too long it might be completely unpractical to setup something like this in the integration process.

I would also like to mention two more general benefits of the code coverage metric.

And a negative, for completeness.

answered Jul 25, 2014 at 7:45

Martin G's user avatar

Martin GMartin G

18k12 gold badges89 silver badges101 bronze badges

If this were a perfect world, 100% of code would be covered by unit tests. However, since this is NOT a perfect world, it's a matter of what you have time for. As a result, I recommend focusing less on a specific percentage, and focusing more on the critical areas. If your code is well-written (or at least a reasonable facsimile thereof) there should be several key points where APIs are exposed to other code.

Focus your testing efforts on these APIs. Make sure that the APIs are 1) well documented and 2) have test cases written that match the documentation. If the expected results don't match up with the docs, then you have a bug in either your code, documentation, or test cases. All of which are good to vet out.

Good luck!

answered Sep 18, 2008 at 4:30

64BitBob's user avatar

64BitBob64BitBob

3,1101 gold badge18 silver badges23 bronze badges

I prefer to do BDD, which uses a combination of automated acceptance tests, possibly other integration tests, and unit tests. The question for me is what the target coverage of the automated test suite as a whole should be.

That aside, the answer depends on your methodology, language and testing and coverage tools. When doing TDD in Ruby or Python it's not hard to maintain 100% coverage, and it's well worth doing so. It's much easier to manage 100% coverage than 90-something percent coverage. That is, it's much easier to fill coverage gaps as they appear (and when doing TDD well coverage gaps are rare and usually worth your time) than it is to manage a list of coverage gaps that you haven't gotten around to and miss coverage regressions due to your constant background of uncovered code.

The answer also depends on the history of your project. I've only found the above to be practical in projects managed that way from the start. I've greatly improved the coverage of large legacy projects, and it's been worth doing so, but I've never found it practical to go back and fill every coverage gap, because old untested code is not well understood enough to do so correctly and quickly.

answered May 12, 2016 at 14:05

Dave Schweisguth's user avatar

Dave SchweisguthDave Schweisguth

37.5k10 gold badges100 silver badges122 bronze badges

85% would be a good starting place for checkin criteria.

I'd probably chose a variety of higher bars for shipping criteria - depending on the criticality of the subsystems/components being tested.

answered Sep 18, 2008 at 4:27

stephbu's user avatar

stephbustephbu

5,08228 silver badges42 bronze badges

4

Code coverage is great but only as long as the benefits that you get from it outweigh the cost/effort of achieving it.

We have been working to a standard of 80% for some time, however we have just made the decison to abandon this and instead be more focused on our testing. Concentrating on the complex business logic etc,

This decision was taken due to the increasing amount of time we spent chasing code coverage and maintaining existing unit tests. We felt we had got to the point where the benefit we were getting from our code coverage was deemed to be less than the effort that we had to put in to achieve it.

answered Sep 19, 2008 at 15:23

Simon Keep's user avatar

Simon KeepSimon Keep

9,98210 gold badges64 silver badges79 bronze badges

I use cobertura, and whatever the percentage, I would recommend keeping the values in the cobertura-check task up-to-date. At the minimum, keep raising totallinerate and totalbranchrate to just below your current coverage, but never lower those values. Also tie in the Ant build failure property to this task. If the build fails because of lack of coverage, you know someone's added code but hasn't tested it. Example:

<cobertura-check linerate="0"
                 branchrate="0"
                 totallinerate="70"
                 totalbranchrate="90"
                 failureproperty="build.failed" />

answered Apr 27, 2009 at 23:29

Gary Kephart's user avatar

Gary KephartGary Kephart

4,9646 gold badges43 silver badges57 bronze badges

2

When I think my code isn't unit tested enough, and I'm not sure what to test next, I use coverage to help me decide what to test next.

If I increase coverage in a unit test - I know this unit test worth something.

This goes for code that is not covered, 50% covered or 97% covered.

answered May 19, 2010 at 15:34

brickner's user avatar

bricknerbrickner

6,5854 gold badges45 silver badges54 bronze badges

5

Short answer: 60-80%

Long answer: I think it totally depends on the nature of your project. I typically start a project by unit testing every practical piece. By the first "release" of the project you should have a pretty good base percentage based on the type of programming you are doing. At that point you can start "enforcing" a minimum code coverage.

answered Sep 18, 2008 at 4:31

user11087's user avatar

user11087user11087

2922 silver badges6 bronze badges

If you've been doing unit testing for a decent amount of time, I see no reason for it not to be approaching 95%+. However, at a minimum, I've always worked with 80%, even when new to testing.

This number should only include code written in the project (excludes frameworks, plugins, etc.) and maybe even exclude certain classes composed entirely of code written of calls to outside code. This sort of call should be mocked/stubbed.

answered Sep 18, 2008 at 4:35

Tony Pitale's user avatar

Tony PitaleTony Pitale

1,1922 gold badges11 silver badges23 bronze badges

Generally speaking, from the several engineering excellence best practices papers that I have read, 80% for new code in unit tests is the point that yields the best return. Going above that CC% yields a lower amount of defects for the amount of effort exerted. This is a best practice that is used by many major corporations.

Unfortunately, most of these results are internal to companies, so there are no public literatures that I can point you to.

answered Sep 18, 2008 at 4:53

user17222's user avatar

user17222user17222

1,7013 gold badges15 silver badges17 bronze badges

My answer to this conundrum is to have 100% line coverage of the code you can test and 0% line coverage of the code you can't test.

My current practice in Python is to divide my .py modules into two folders: app1/ and app2/ and when running unit tests calculate the coverage of those two folders and visually check (I must automate this someday) that app1 has 100% coverage and app2 has 0% coverage.

When/if I find that these numbers differ from standard I investigage and alter the design of the code so that coverage conforms to the standard.

This does mean that I can recommend achieving 100% line coverage of library code.

I also occasionally review app2/ to see if I could possible test any code there, and If I can I move it into app1/

Now I'm not too worried about the aggregate coverage because that can vary wildly depending on the size of the project, but generally I've seen 70% to over 90%.

With python, I should be able to devise a smoke test which could automatically run my app while measuring coverage and hopefully gain an aggreagate of 100% when combining the smoke test with unittest figures.

answered Sep 19, 2008 at 10:11

quamrana's user avatar

quamranaquamrana

39.3k13 gold badges55 silver badges76 bronze badges

Check out Crap4j. It's a slightly more sophisticated approach than straight code coverage. It combines code coverage measurements with complexity measurements, and then shows you what complex code isn't currently tested.

answered Sep 18, 2008 at 20:00

Don Kirkby's user avatar

Don KirkbyDon Kirkby

56.2k27 gold badges218 silver badges300 bronze badges

Viewing coverage from another perspective: Well-written code with a clear flow of control is the easiest to cover, the easiest to read, and usually the least buggy code. By writing code with clearness and coverability in mind, and by writing the unit tests in parallel with the code, you get the best results IMHO.

answered Jan 31, 2009 at 11:16

In my opinion, the answer is "It depends on how much time you have". I try to achieve 100% but I don't make a fuss if I don't get it with the time I have.

When I write unit tests, I wear a different hat compared to the hat I wear when developing production code. I think about what the tested code claims to do and what are the situations that can possible break it.

I usually follow the following criteria or rules:

  1. That the Unit Test should be a form of documentation on what's the expected behavior of my codes, ie. the expected output given a certain input and the exceptions it may throw that clients may want to catch (What the users of my code should know?)
  2. That the Unit Test should help me discover the what if conditions that I may not yet have thought of. (How to make my code stable and robust?)

If these two rules doesn't produce 100% coverage then so be it. But once, I have the time, I analyze the uncovered blocks and lines and determine if there are still test cases without unit tests or if the code needs to be refactored to eliminate the unecessary codes.

answered Aug 14, 2011 at 15:13

Mark Menchavez's user avatar

It depends greatly on your application. For example, some applications consist mostly of GUI code that cannot be unit tested.

answered Sep 18, 2008 at 4:29

Thomas's user avatar

ThomasThomas

181k55 gold badges373 silver badges497 bronze badges

1

I don't think there can be such a B/W rule.
Code should be reviewed, with particular attention to the critical details.
However, if it hasn't been tested, it has a bug!

answered Sep 18, 2008 at 4:30

Nescio's user avatar

NescioNescio

28.3k10 gold badges55 silver badges75 bronze badges

1

Depending on the criticality of the code, anywhere from 75%-85% is a good rule of thumb. Shipping code should definitely be tested more thoroughly than in house utilities, etc.

answered Sep 18, 2008 at 4:31

William Keller's user avatar

William KellerWilliam Keller

5,3801 gold badge27 silver badges22 bronze badges

This has to be dependent on what phase of your application development lifecycle you are in.

If you've been at development for a while and have a lot of implemented code already and are just now realizing that you need to think about code coverage then you have to check your current coverage (if it exists) and then use that baseline to set milestones each sprint (or an average rise over a period of sprints), which means taking on code debt while continuing to deliver end user value (at least in my experience the end user doesn't care one bit if you've increased test coverage if they don't see new features).

Depending on your domain it's not unreasonable to shoot for 95%, but I'd have to say on average your going to be looking at an average case of 85% to 90%.

answered Sep 18, 2008 at 4:33

codeLes's user avatar

codeLescodeLes

3,0693 gold badges29 silver badges27 bronze badges

I think the best symptom of correct code coverage is that amount of concrete problems unit tests help to fix is reasonably corresponds to size of unit tests code you created.

answered Sep 18, 2008 at 4:34

dimarzionist's user avatar

dimarzionistdimarzionist

18.6k4 gold badges24 silver badges23 bronze badges

I think that what may matter most is knowing what the coverage trend is over time and understanding the reasons for changes in the trend. Whether you view the changes in the trend as good or bad will depend upon your analysis of the reason.

answered Apr 27, 2009 at 22:49

Rob Scott's user avatar

Rob ScottRob Scott

4493 silver badges4 bronze badges

We were targeting >80% till few days back, But after we used a lot of Generated code, We do not care for %age, but rather make reviewer take a call on the coverage required.

answered Sep 18, 2008 at 4:32

reva's user avatar

revareva

1,4772 gold badges14 silver badges24 bronze badges

From the Testivus posting I think the answer context should be the second programmer.

Having said this from a practical point of view we need parameter / goals to strive for.

I consider that this can be "tested" in an Agile process by analyzing the code we have the architecture, functionality (user stories), and then come up with a number. Based on my experience in the Telecom area I would say that 60% is a good value to check.

isherwood's user avatar

isherwood

60.8k16 gold badges120 silver badges167 bronze badges

answered Mar 13, 2013 at 17:28

D  Lovece's user avatar