[LLVMdev] First-class debug info IR: MDLocation (original) (raw)

Duncan P. N. Exon Smith dexonsmith at apple.com
Fri Oct 24 16:16:28 PDT 2014


I've attached a preliminary patch for MDLocation as a follow-up to the RFC 1 last week. It's not commit-ready -- in particular, it squashes a bunch of commits together and doesn't pass make check -- but I think it's close enough to indicate the direction and work toward consensus.

IMO, the files to focus on are:

include/llvm/IR/DebugInfo.h
include/llvm/IR/DebugLoc.h
include/llvm/IR/Metadata.h
include/llvm/IR/Value.h
lib/AsmParser/LLLexer.cpp
lib/AsmParser/LLParser.cpp
lib/AsmParser/LLParser.h
lib/AsmParser/LLToken.h
lib/Bitcode/Reader/BitcodeReader.cpp
lib/Bitcode/Writer/BitcodeWriter.cpp
lib/Bitcode/Writer/ValueEnumerator.cpp
lib/Bitcode/Writer/ValueEnumerator.h
lib/IR/AsmWriter.cpp
lib/IR/AsmWriter.h
lib/IR/DebugInfo.cpp
lib/IR/DebugLoc.cpp
lib/IR/LLVMContextImpl.cpp
lib/IR/LLVMContextImpl.h
lib/IR/Metadata.cpp

Using Value instead of MDNode

A number of APIs expect MDNode -- previously, the only referenceable type of metadata -- but this patch (and the ones that will follow) have referenceable metadata that do not inherit from MDNode. Metadata APIs such as Instruction::getMetadata() and NamedMDNode::getOperand() need to return non-MDNode metadata.

I plan to commit the API changes incrementally so we can fix any issues there before pushing the functionality changes. Unfortunately, this currently adds a lot of noise to the (squashed) patch.

Introducing MDLocation

Of course, this adds MDLocation, the first subclass of MDUser. This is a first-class IR type that has two other representations: DILocation (which now trivially wraps MDLocation instead of MDNode) and DebugLoc.

I've genericised the code in LLParser (and elsewhere) to sketch out how adding other MDUser subclasses will go. Perhaps I used the wrong axis, but we can adjust it as we go.

Usage examples:

!6 = metadata MDLocation(line: 43, column: 7, scope: !4)
!7 = metadata MDLocation(scope: !5, line: 67, inlinedAt: !6)

The fields can be listed in any order. The scope: field is required, but the others are optional (line: and column: default to 0, inlinedAt: defaults to null).

(Note that in the RFC I referred to this as an MDLineTable, but MDLocation is a better name. If/when this work supersedes the DIDescriptor hierarchy, it'll likely get renamed to DILocation, but for now there's a name clash.)

Where this is heading

Let's look at a concrete example. Here's some simple C++ code:

$ cat t.h
struct T { short a; short b; };
$ cat foo.cpp
#include "t.h"
int foo(T t) { return t.a + t.b; }
$ cat bar.cpp
#include "t.h"
int foo(T t);
int bar(T t) { return foo(t) * 2; }

Looking forward, after refactoring ownership and uniquing and fixing up a few schema issues, I'd expect the above to link into something like the following:

!0 = metadata DIFile(filename: "foo.cpp", directory: "/path/to")
!1 = metadata DIFile(filename: "./t.h", directory: "/path/to")
!2 = metadata DIFile(filename: "bar.cpp", directory: "/path/to")
!3 = metadata DIBaseType(name: "short", size: 16, align: 16)
!5 = metadata DIBaseType(name: "int", size: 32, align: 32)
!6 = metadata DICompositeType(tag: 0x13, name: "T", uniqued: "_ZTS1T",
                              file: !1, line: 1, size: 32, align: 16)
!7 = metadata DIMember(line: 1, file: !1, type: !3,
                       name: "a", size: 16, align: 16, context: !6)
!8 = metadata DIMember(line: 1, file: !1, type: !3,
                       name: "b", size: 16, align: 16, context: !6)
!9 = metadata DISubroutineType(args: [ !5, !6 ])
!10 = metadata DICompileUnit(file: !0, language: 4, kind: FullDebug,
                             producer: "clang version 3.6.0 ",
                             retainedUniqueTypes: [ !6 ])
!11 = metadata DISubprogram(name: "foo", linkageName: "_Z3foo1T",
                            handle: i32(i32)* @_Z3foo1T, file: !0,
                            type: !9, context: !10)
!12 = metadata DIArgVariable(name: "t", arg: 1, line: 2, type: !6,
                             context: !11)
!13 = metadata DILocation(line: 2, column: 11, scope: !11)
!14 = metadata DILocation(line: 2, column: 16, scope: !11)
!15 = metadata DICompileUnit(file: !2, language: 4, kind: FullDebug,
                             producer: "clang version 3.6.0 ",
                             retainedUniqueTypes: [ !6 ])
!16 = metadata DISubprogram(name: "bar", linkageName: "_Z3bar1T",
                            handle: i32 (i32)* @_Z3bar1T, file: !2,
                            type: !9, context: !15)
!17 = metadata DIArgVariable(name: "t", arg: 2, line: 3, type: !6,
                             context: !16)
!18 = metadata DILocation(line: 3, column: 11, scope: !16)
!19 = metadata DILocation(line: 3, column: 23, scope: !16)

Notice that only the links to parents (i.e., context:) are explicit here -- backlinks are implied. For example, !7 and !8 point to !6, but not the reverse.

This has the interesting property of removing all cycles from serialization (assembly and bitcode).

Making debug info assembly readable and writable

Moreover, we're now in a place where it's trivial to express the "context" pointer structurally. Here's the same debug info as above, using syntactic sugar to fill the "context" pointers:

!0 = metadata DIFile(filename: "foo.cpp", directory: "/path/to")
!1 = metadata DIFile(filename: "./t.h", directory: "/path/to")
!2 = metadata DIFile(filename: "bar.cpp", directory: "/path/to")
!3 = metadata DIBaseType(name: "short", size: 16, align: 16)
!5 = metadata DIBaseType(name: "int", size: 32, align: 32)
!6 = metadata DICompositeType(tag: 0x13, name: "T", uniqued: "_ZTS1T",
                              file: !1, line: 1, size: 32, align: 16) {
  !7 = metadata DIMember(line: 1, file: !1, type: !3,
                         name: "a", size: 16, align: 16)
  !8 = metadata DIMember(line: 1, file: !1, type: !3,
                         name: "b", size: 16, align: 16)
} ; !6
!9 = metadata DISubroutineType(args: [ !5, !6 ])
!10 = metadata DICompileUnit(file: !0, language: 4, kind: FullDebug,
                             producer: "clang version 3.6.0 ",
                             retainedUniqueTypes: [ !6 ]) {
  !11 = metadata DISubprogram(name: "foo", linkageName: "_Z3foo1T",
                              handle: i32(i32)* @_Z3foo1T, file: !0,
                              type: !9) {
    !12 = metadata DIArgVariable(name: "t", arg: 1, line: 2, type: !6)
    !13 = metadata DILocation(line: 2, column: 11)
    !14 = metadata DILocation(line: 2, column: 16)
  } ; !11
} ; !10
!15 = metadata DICompileUnit(file: !2, language: 4, kind: FullDebug,
                             producer: "clang version 3.6.0 ",
                             retainedUniqueTypes: [ !6 ]) {
  !16 = metadata DISubprogram(name: "bar", linkageName: "_Z3bar1T",
                              handle: i32 (i32)* @_Z3bar1T, file: !2,
                              type: !9) {
    !17 = metadata DIArgVariable(name: "t", arg: 2, line: 3, type: !6)
    !18 = metadata DILocation(line: 3, column: 11)
    !19 = metadata DILocation(line: 3, column: 23)
  } ; !16
} ; !15

This assembly has the following advantages over the status quo:

Bike sheds to paint

  1. Should we trim some boilerplate? E.g., it would be trivial to change:

    !6 = metadata MDLocation(line: 43, column: 7, scope: !4)

    to:

    !6 = MDLocation(line: 43, column: 7, scope: !4)

    This would not complicate LLParser. Thoughts?

  2. Which of the two "end goal" syntaxes is better: flat, or hierarchical? Better for what? Why?

    The flat one might be better for FileCheck-ing (not sure), but IMO the hierarchical one is much saner for us humans, and that's the main point of assembly. It wouldn't be hard to default to one and write the other based on a command-line flag -- is that a good idea?

  3. Assembly syntax is pretty easy to change, so this doesn't have to be perfect now. Nevertheless, is there a magical syntax that would be easier to read/write/FileCheck?

-------------- next part -------------- A non-text attachment was scrubbed... Name: MDLocation-preview.patch Type: application/octet-stream Size: 608893 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141024/2296c85c/attachment.obj> -------------- next part --------------



More information about the llvm-dev mailing list