Unit Tests Correlate With Desirable Codebase Properties - NDepend Blog (original) (raw)

January 23, 2018 6 minutes read

Today, I give you the third post in a series about how unit tests affect codebases.

The first one wound up getting a lot of attention, which was fun. In it, I presented some analysis I’d done of about 100 codebases. I had formed hypotheses about how I thought unit tests would affect codebases, and then I tested those hypotheses.

In the second post, I incorporated a lot of the feedback that I had requested in the first post. Specifically, I partnered with someone to do more rigorous statistical analysis on the raw data that I’d found. The result was much more clarity about not only the correlations among code properties but also how much confidence we could have in those relationships. Some had strong relationships while others were likely spurious.

In this post, though, I’m incorporating the single biggest piece of feedback. I’m analyzing more codebases.

Analysis of 500 (ish) C# Codebases

Performing static analysis on and recording information about 500 codebases isn’t especially easy. To facilitate this, I’ve done significant work automating ingestion of codebases:

That’s been a big help, but there’s still the matter of finding these codebases. To do that, I mined a handful of “awesome codebase” lists, like this one. I pointed the analysis tool at something like 750 codebases, and it naturally filters out any that don’t compile or otherwise have trouble in the automated process.

This left me with 503 valid codebases. That number came down to 495 once adjusted for codebases that, for whatever reason, didn’t have any (non-third party) methods or types or that were otherwise somehow trivial.

So the results here are the results of using NDepend for static analysis on 495 C# codebases.

Stats About the Codebases

Alright. So what happened with the analysis? I’ll start with some stats that interested me and hopefully interest you. I’m looking here to offer some perspective.

Findings From Last Time

Here’s a quick recap of some of the findings from last time around.

I’ve omitted a few of the things I studied in the previous posts, both for the sake of brevity and in order to focus on what I think of as properties of clean codebases. Generally speaking, you want code with fewer lines, less complexity, fewer parameters, fewer overloads, and less nesting per method. In terms of types, you want a flat inheritance hierarchy and more cohesion.

What a Difference 400 Codebases Makes

So, let’s take a look at what happens now that we substantially increased sample size. I’ll summarize here and add a couple of screenshots below that.

Average Cyclomatic Complexity Per Method

Average Method Nesting Depth

Lines of Code Per Method

Number of Overloads Per Method

Unit Tests and Clean Code

If I circle back to my original hypotheses, it seems I’m doing better as I add more codebases to the study.

With 500 codebases in the mix, the results have improved considerably, though I’m not entirely sure why. Perhaps some outliers skewed the original study a bit more, or perhaps this resulted from the codebase corpus on the whole becoming more “unit-test heavy.” But whatever the reason, five times the sample size is starting to show some pretty definitive results.

The properties that we associate with clean code — cohesion, minimal complexity, and overall thematic simplicity — seem to show up more as unit tests show up more.

The only exception that truly surprises me was and remains lines of code per method. I wonder if this might be the result of a higher prevalence of properties in non-test-heavy codebases or some other common relationship situation. In any case, though, it’s interesting.

But 500 codebases analyzed automatically and results synthesized with statistical modeling software, I feel pretty good about where this study is. And while it doesn’t paint a “unit tests make everything rainbows and unicorns” picture, this study now demonstrates, pretty definitively, that codebases with unit tests also have other desirable properties.

What’s Next?

I’m going to keep working, in conjunction with the person doing the statistical models, to study more properties of codebases. And I think, for now, I’m going to wrap this unit test study and move on to other things, satisfied that we’ve given it a pretty good treatment.

One thing that occurs to me is the somewhat important differences between 100 codebases and 500. Maybe I should grow the corpus to 1,000 or even 2,500 to make sure I don’t see a similar reversal. But the thing is, that’s a lot of codebases, and I’ve already nearly exhausted the “awesome lists,” so I’m worried about diminishing returns. Rest assured, though — I’ll keep slurping down codebases, and if I find myself with significantly more at some point, we’ll redo the analysis.

So what’s next? I’ve had a few ideas and am brainstorming more of them.

These are just some ideas. Weigh in below in the comments with your own, if you’d like. Hopefully you all find this stuff as interesting as I do!

This article is brought to you by the team behind NDepend — a proven .NET static analysis tool for improving code maintainability, security, and overall quality. Whether you’re modernizing a legacy .NET application or starting fresh in C#, get started with your free full-featured trial today!