[RFC] Debug info coverage tool (original) (raw)

September 19, 2024, 5:29pm 21

FTR this measurement is not especially difficult to implement, given unoptimized and optimized builds for the same source. I had tooling like this at a previous job for a different compiler. We were looking specifically at lines with is_stmt set, because that compiler did a decent job of picking is_stmt locations; LLVM’s heuristic is lame, so just identifying the set of source locations regardless of is_stmt is probably the way to go. But again it’s just collecting data from two .debug_line sections and diffing them, it’s not hard. For all I know, llvm-debuginfo-analyzer already has a mode like that.

I will take exception to the unstated premise, that it is a reasonable goal to get to 100% parity with unoptimized statement coverage. I believe that is unlikely, and that there is an “unachievable” part of the graph, just as you discovered for variables. I admit I don’t have hard data, but if you think about code-deleting optimizations such as CSE. dead-store removal, and unreachable code removal, there will be no instructions in the final object file for those source statements/expressions. Given that the DWARF line table is a mapping from instructions to source locations, if there are no instructions there can be no mapping.

Understanding that an optimized line table might (continuing your example) have say 70% line-table coverage–that is, maybe 30% of original source lines do not exist in even the most perfect optimized line table–then a variable-coverage metric that excluded those non-existent lines from its calculations would clearly still have high value. A metric that considered those non-existent lines to be “missing” from variable coverage would be less valuable.

So, in order to separate out the concerns about line coverage from concerns about variable coverage, I’d want a variable-coverage metric to compare covered lines as they exist in the compiled object file, rather than compared to some idealized object where all lines seen at O0 are still present in the optimized object, because that idealized object may well not be theoretically achievable. This does mean that improvements to line coverage might alter the variable-coverage metric, but it should not be difficult for users of the metrics to understand that.