(original) (raw)

It's worth remembering that there are two syntactically similar but semantically different kinds of "expression" in DWARF.

A DWARF expression computes a value; if the available value is a pointer, you add DW\_OP\_deref to express the pointed-to value. A DWARF location expression computes a location, and adds various operators to express locations that a (value) expression cannot, such as DW\_OP\_regx. You also have DW\_OP\_stack\_value to say "just kidding, this location expression is a value expression."

So, whether we want to start throwing around deref or stack\_value or regx (implicit or explicit) really depends on whether we are going to be using value expressions or location expressions. Let's not start mixing them up, it will just make the discussion more confusing.

--paulr

From: llvm-dev \[mailto:llvm-dev-bounces@lists.llvm.org\] On Behalf Of David Blaikie via llvm-dev
Sent: Wednesday, September 06, 2017 10:02 AM
To: Reid Kleckner; llvm-dev
Subject: Re: \[llvm-dev\] RFC: Introduce DW\_OP\_LLVM\_memory to describe variables in memory with dbg.value

On Tue, Sep 5, 2017 at 1:00 PM Reid Kleckner via llvm-dev <llvm-dev@lists.llvm.org> wrote:

Debug info today handles two cases reasonably well:

1\. At -O0, dbg.declare does a good job describing variables that live at some known stack offset

2\. With optimizations, variables promoted to SSA can be described with dbg.value

This leaves behind a large hole in our optimized debug info: variables that cannot be promoted, typically because they are address-taken. This is https://llvm.org/pr34136, and this RFC is mostly about addressing that.

The status today is that instcombine removes all dbg.declares and heuristically inserts dbg.values where it can identify the value of the variable in question. This prevents us from having misleading debug info, but it throws away information about the variable’s location in memory.

Part of the reason that instcombine discards dbg.declares is that we can’t mix and match dbg.value with dbg.declare. If the backend sees a dbg.declare, it accepts that information as more reliable and discards all DBG\_VALUE instructions associated with that variable. So, we need something we can mix. We need a way to say, the variable lives in memory \*at this program point\*, and it might live somewhere else later on. I propose that we introduce DW\_OP\_LLVM\_memory for this purpose, and then we transition from dbg.declare to dbg.value+DW\_OP\_LLVM\_memory.

Initially I believed that DW\_OP\_deref was the way to say this with existing DWARF expression opcodes, but I implemented that in https://reviews.llvm.org/D37311 and learned more about how DWARF expressions work. When a debugger begins evaluating a DWARF expression, it assumes that the resulting value will be a pointer to the variable in memory. For a debugger, this makes sense, because debug builds put things in memory and even after optimization many variables must be spilled. Only the special DW\_OP\_regN and DW\_OP\_stack\_value expression opcodes change the location of the value from memory to register or stack value.

LLVM SSA values obviously do not have an address that we can take and they don’t live in registers, so neither the default memory location model nor DW\_OP\_regN make sense for LLVM’s dbg.value. We could hypothetically repurpose DW\_OP\_stack\_value to indicate that the SSA value passed to llvm.dbg.value \*is\* the variable’s value, and if the expression lacks DW\_OP\_stack\_value, it must be a the address of the value. However, that is backwards incompatible and it seems like quite a stretch.

Seems like a stretch in what sense? The backwards incompatibility is certainly something to consider (though we went through that with DW\_OP\_bit\_piece too), but this seems like the design I'd go to first so I'd like to better understand why it's not the path forward if there's some more detail about that aspect of the design choice here.

I guess you described this already, but talking it through for myself/maybe others will find this useful:

So since we don't have DW\_OP\_regN for LLVM registers, we could sort of assume the implicit first value on the stack is a pseudo-OP\_regN of the LLVM SSA register.

To support that, all existing uses would need no changes to match the DWARF model of registers being implicitly direct values.

Code that wanted to describe the register as containing the memory address of the interesting thing would use DW\_OP\_stack\_value to say "this location description that is a register is really an address you should follow to find the value, not a direct value itself"?

But code that wanted to describe a variable as being 3 bytes ahead of a pointer in an LLVM SSA register would only have "plus 3" in the expression stack, since then it's no longer a direct value but is treated as a pointer to the value. I guess this is where the ambiguity would come in - currently how does "plus 3" get interpreted when seen in LLVM IR, I guess that's meant to describe reg value + 3 as being the immediate value of the variable? (so it's implicitly OP\_stack\_value? & OP\_stack\_value is added somewhere in the DWARF backend?)

Thanks,
\- Dave

DW\_OP\_LLVM\_memory would be very similar to DW\_OP\_stack\_value, though. It would only be valid at the end of a DIExpression. The backend will always remove it because the debugger will assume the variable lives in memory unless it is told otherwise.

For the original problem of improving optimized debug info while avoiding inaccurate information in the presence of dead store elimination, consider this C example:

int x = 42; // Can DSE

dostuff(x); // Can propagate 42

x = computation(); // Post-dominates \`x = 42\` store

escape(&x);

We should be able to do this:

int x; // eliminate \`x = 42\` store

dbg.value(!x, 42, !DIExpression()) // mark x as the constant 42 in debug info

dostuff(42); // propagate 42

dbg.value(!x, &x, !DIExpression(DW\_OP\_LLVM\_memory)) // x is in memory again

x = computation();

escape(&x);

Passes that delete stores would be responsible for checking if the store destination is part of an alloca with associated dbg.value instructions. They would emit a new dbg.value instruction for that variable with the stored value, and clone the dbg.value instruction that puts the variable back in memory before the killing store. If the store is dead because variable lifetime is ending, the second dbg.value is unnecessary.

This will also allow us to fix debug info for px in this example:

void \_\_attribute\_\_((optnone, noinline)) usevar(int \*x) {}

int main(int argc, char \*\*argv) {

int x = 42;

int \*px = &x;

usevar(&x);

if (argc) usevar(px);

}

Today, we emit a location for px like \`DW\_OP\_breg7 RSP+12\`, which gives it the incorrect value 42\. This is because our DBG\_VALUE instruction for px’s location uses a frame index, which we assume is in memory. This is not the case, px is not in memory, it’s value is a stack object pointer.

Please reply if you have any thoughts on this proposal. Adrian and I hashed this out over Bugzilla, IRC, and in person, so it shouldn’t be too surprising. Let me know if you want to be CC’d on the patches.

\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
LLVM Developers mailing list
llvm-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev