[RFC] Debug info coverage tool (original) (raw)

Yes, but it would also be impossible to observe the difference, since you cannot stop at the source location that the compiler dropped. Now it’s possible the dropped debug value could turn into a correctness problem a later source location, but that would be a separate issue.

I.e., I wonder if we could separate out the concerns of determining metrics for source location (coverage, correctness) and variable location (coverage, correctness) and treat them as 4 different problems?

On the other hand, if we’re confident in the line table and want the tool to focus only on the variable info, then using the optimised line table makes sense.

I’m not confident in it at all, but I think

It’s really an LLVM call about to what extent the “indiscriminate elimination” thing is a real hazard. I know the !dbg metadata nodes for the line table are distinct from the dbg.value and dbg.declare. If the treatment is sufficiently orthogonal in practice, then that would be an argument that it’s OK to trust the optimised line table.

I wonder if there’s a quick way to rustle up an experiment that sheds light on this. E.g. if across profiling runs of an O0 binary we rarely see a bigger set of reached lines than running the same test/input at O2, it would establish trust in the optimised line table.

I guess that would be interesting to know, but IMO it’s an orthogonal problem that needs to be addressed separately. In the end we want both high-quality source locations and high-quality variables, because without source locations you can’t meaningfully inspect the variables. Without a source location you cannot set a breakpoint, so you also can’t inspect the variables at that missing location. And if you, e.g., instruction-step through a entire function with missing source locations, then you don’t where you are, which renders the variable values meaningless. (At least in the general case, constants might still be interesting).

I take the point that a non-Clang frontend may generate lexical scopes differently, but unless I’m missing something, I don’t think that changes the fundamentals? (It could maybe eliminate the liveness analysis if we assume a variable’s scope is precisely matched to its lifetime, but that’s not safe to assume in general.)

It doesn’t change the fundamentals, this was more a comment about the relative importance of this subproblem; it only affects some (admittedly important) languages.

I’m not sure I understand what you said about “a variable can only be inspected at a break point”… surely that’s not true in general? E.g. we could always have is_stmt = 0, but still somehow step through our code inspecting variables as we go. I’m pretty sure my debugging workflows often involve inspecting variables at non-breakpoint instructions, e.g. maybe directly on return from a call.

I used the word break point without defining it first. What I meant is a distinct source location (one you could set a breakpoint on or step to). Unless they are constants, variable values are only meaningful together with a source location. A call site typically has a source location.

So perhaps it’s worth thinking separately about two aspects of the line table (and the trustworthiness of each): the full collection of embodied source lines, and the subset that have is_stmt set at some PC. Both of these could be used to provide our baseline, of course, and the trustworthiness of each may be different.

As I said above I was not thinking necessarily about is_stmt (but I understand why you arrived at that conclusion). I’ll try to use the word “source locations” going forward, because I also find the word “line” problematic for modern programming languages that often have very complex control flow and even multiple closures/lambdas on a single line.