[LLVMdev] First-class debug info IR: MDLocation (original) (raw)

Duncan P. N. Exon Smith dexonsmith at apple.com
Mon Oct 27 10:49:44 PDT 2014

Previous message: [LLVMdev] First-class debug info IR: MDLocation
Next message: [LLVMdev] First-class debug info IR: MDLocation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2014-Oct-27, at 00:58, Chandler Carruth <chandlerc at google.com> wrote:

I haven't been able to follow all of the thread that got us here but your patch below has distilled the result enough for me to at least ask questions.

Always takes a patch to draw people in :).

I'm sorry of some of the justification is buried in the thread and I'm just making you repeat it, but I suspect I'm not the only one that would benefit from the rationale being summarized here.

On Fri, Oct 24, 2014 at 4:16 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: Using Value instead of MDNode ================================= A number of APIs expect MDNode -- previously, the only referenceable type of metadata -- but this patch (and the ones that will follow) have referenceable metadata that do not inherit from MDNode. Metadata APIs such as Instruction::getMetadata() and NamedMDNode::getOperand() need to return non-MDNode metadata. To me, this change is a red flag

This bothers me too -- which is why I highlighted it -- but I don't see any alternatives. It seems like a natural fallout of the rest of the proposal.

and points out a bit of a lie in the subject line: this is not actually first-class debug-info IR. This is just making debug info become special metadata with special encoding properties.

How special does it have to be to be labeled "first-class"?

IMO, the label makes sense here: custom C++ type, bitcode, assembly, uniquing, and ownership. Doesn't seem any less "special" than, say, AddInst, but I don't really care what we call it.

Have I missed your point? (Are you suggesting metadata is inherently second-class? How so?)

Note, I'm actually ok with us having special metadata that has special encoding properties. But if we're going that route, I don't think that there is anything "debug info" centric about it, and it shouldn't be described as such.

The infrastructure won't be debug info centric, but the scope of the project certainly is.

I'll be making "first-class" types for each type of debug info we use, and changing their schema, ownership, and uniquing along the way.

I'm not going to touch any other metadata, although certainly if we find that (e.g.) profile metadata has become a source of pain, we could customize it in the future.

I also think the relationship of MDUser, MDNode, and MDString need to be clarified a great deal. Why doesn't getMetadata return an 'MDUser*' for example?

In the class hierarchy:

Value -> MDString
Value -> MDNode
Value -> NamedMDNode
Value -> User -> MDUser -> ...

I named MDUser by its relationship with User, but maybe a better name is CustomMD? (Any suggestions?)

Here's a breakdown:

MDString is an arbitrary string that can be used as an operand. It's owned by an LLVMContext and treated like a constant.
MDNode is a generic node with arbitrary operands. It can itself be used as an operand, and it can be attached to Instructions. Its operands don't "know" that it's using them. It's owned by an LLVMContext and treated like a constant.
NamedMDNode is a generic node with arbitrary operands. It cannot be used as an operand, and its operands must all be MDNode (or MDUser/CustomMD). It's owned by a Module.
MDUser/CustomMD is a parent class for specific types of metadata. They can be used as operands and attached to Instructions. Their metadata operands "know" that they're being used, but they may have handles to non-metadata (which don't know). Their ownership is customized per subclass.

Also relevant (and kind of implied by the above): Instruction can have arbitrary metadata attached to it, but it must be an MDNode (or an MDUser/CustomMD).

It feels as though you really want to sink the current functionality of MDNode down to some subclass of a more generic metadata IR type? Maybe I'm misunderstanding?

IMO, the overlap in functionality between User and MDNode precludes having a (sane) inheritance relationship between MDNode and MDUser/ CustomMD.

Both User and MDNode implement support for an arbitrary number of operands, but using completely incompatible mechanisms.

Since MDNode can reference an arbitrary number of arbitrary Values, it uses a subclass of CallbackVH called MDNodeOperand that costs 32B (x86-64). Moreover, RAUW is expensive.

MDUser/CustomMD inherits from User so that its subclasses can leverage the use-list infrastructure (8B per operand, fast RAUW).

I also have to ask because I can't currently see it: what does debug info being metadata buy us?

I suppose it buys us:

the guarantee that debug info doesn't modify optimizations or code generation, and
the flexibility for optimizations to ignore/drop it when they're not smart enough to update it.

How much code is simplified by that, and at what cost?

I think that's hard to quantify.

I suppose the obvious alternative is to rewrite debug info from the ground up, without using metadata at all. I considered this, but didn't find a compelling argument for it. Main arguments against it: it would be harder to implement incrementally, and it would increase the amount of non-code IR.

Moreover, once we have specific subclasses and bitcode support for debug info types, moving away from metadata (or even the Value hierarchy entirely) would be an incremental step.

Do you have any specific alternatives mind?

Previous message: [LLVMdev] First-class debug info IR: MDLocation
Next message: [LLVMdev] First-class debug info IR: MDLocation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the llvm-dev mailing list