[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR (original) (raw)

David Blaikie dblaikie at gmail.com
Mon Oct 13 15:47:00 PDT 2014

Previous message: [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
Next message: [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Oct 13, 2014 at 3:37 PM, Reid Kleckner <rnk at google.com> wrote:

I think making debug info more of a first-class IR citizen is probably the way to go. Right now debug info is completely unreadable and is downright opposed to the design goals of the IR as I understand them.

I'm still not sure this would produce particularly more legible, let alone writeable, debug info IR. It's possible, certainly, if the schema was baked into IR reading and writing, that we could pretty print it with annotated field names and allow writing the debug info with omitted fields (because the parser would know that this was, say, a subprogram record, and be able to reorder fields to the required schema or add default values for omitted fields), but I'm not sure we'd get that far nor whether it would really tip debug info to the point of writeability - it's still necessarily a format that describes code, which tends towards being more ungainly than the code itself. ("this thing is on line 42" rather than "thing" written on line 42)

I'd have to see examples & promises of where this would go/what value it would add, but I'd still be fairly concerned about the ongoing costs.

Our backwards compatibility policy should give you the flexibility you need to update the debug info representation as you go along: http://llvm.org/docs/DeveloperPolicy.html#id18

It's a rather heavy burden to carry. Currently we have a much lighter cost to changing the debug info schema (rev the version number - any debug info with an older version number is dropped on sight).

On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith <_ _dexonsmith at apple.com> wrote: In r219010, I merged integer and string fields into a single header field. By reducing the number of metadata operands used in debug info, this saved 2.2GB on an llvm-lto bootstrap. I've done some profiling of DWTAGs to see what parts of PR17891 and PR17892 to tackle next, and I've concluded that they will be insufficient.

Instead, I'd like to implement a more aggressive plan, which as a side-effect cleans up the much "loved" debug info IR assembly syntax. At a high-level, the idea is to create distinct subclasses of Value for each debug info concept, starting with line table entries and moving on to the DIDescriptor hierarchy. By leveraging the use-list infrastructure for metadata operands -- i.e., only using value handles for non-metadata operands -- we'll improve memory usage and increase RAUW speed. My rough plan follows. I quote some numbers for memory savings below based on an -flto -g bootstrap of llvm-lto (i.e., running llvm-lto on llvm-lto.lto.bc, an already-linked bitcode file dumped by ld64's -save-temps option) that currently peaks at 15.3GB. 1. Introduce MDUser, which inherits from User, and whose Uses must all be metadata. The cost per operand is 1 pointer, vs. 4 pointers in an MDNode. 2. Create MDLineTable as the first subclass of MDUser. Use normal fields (not Values) for the line and column, and use Use operands for the metadata operands. On x86-64, this will save 104B / line table entry. Linking llvm-lto uses ~7M line-table entries, so this on its own saves ~700MB. Sketch of class definition: class MDLineTable : public MDUser { unsigned Line; unsigned Column; public: static MDLineTable *get(unsigned Line, unsigned Column, MDNode *Scope); static MDLineTable *getInlined(MDLineTable *Base, MDNode *Scope); static MDLineTable *getBase(MDLineTable *Inlined); unsigned getLine() const { return Line; } unsigned getColumn() const { return Column; } bool isInlined() const { return getNumOperands() == 2; } MDNode *getScope() const { return getOperand(0); } MDNode *getInlinedAt() const { return getOperand(1); } }; Proposed assembly syntax: ; Not inlined. !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9) ; Inlined. !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9, inlinedAt: metadata !10) ; Column defaulted to 0. !7 = metadata !MDLineTable(line: 45, scope: metadata !9) (What colour should that bike shed be?) 3. (Optional) Rewrite DebugLoc lookup tables. My profiling shows that we have 3.5M entries in the DebugLoc side-vectors for 7M line table entries. The cost of these is ~180B each, for another ~600MB. If we integrate a side-table of MDLineTables into its uniquing, the overhead is only ~12B / line table entry, or ~80MB. This saves 520MB. This is somewhat perpendicular to redesigning the metadata format, but IMO it's worth doing as soon as it's possible. 4. Create GenericDebugMDNode, a transitional subclass of MDUser through an intermediate class DebugMDNode with an allocation-time-optional CallbackVH available for referencing non-metadata. Change DIDescriptor to wrap a DebugMDNode instead of an MDNode. This saves another ~960MB, for a running total of ~2GB. Proposed assembly syntax: !7 = metadata !GenericDebugMDNode(tag: DWTAGcompileunit, fields: "0\00clang 3.6\00...", operands: { metadata !8, ... }) !7 = metadata !GenericDebugMDNode(tag: DWTAGvariable, fields: "globalvar\00...", operands: { metadata !8, ... }, handle: i32* @globalvar) This syntax pulls the tag out of the current header-string, calls the rest of the header "fields", and includes the metadata operands in "operands". 5. Incrementally create subclasses of DebugMDNode, such as MDCompileUnit and MDSubprogram. Sub-classed nodes replace the "fields" and "operands" catch-alls with explicit names for each operand. Proposed assembly syntax: !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName: "foo", linkageName: "Z3foov", file: metadata !8, function: i32 (i32)* @foo) 6. Remove the dead code for GenericDebugMDNode. 7. (Optional) Refactor DebugMDNode sub-classes to minimize RAUW traffic during bitcode serialization. Now that metadata types are known, we can write debug info out in an order that makes it cheap to read back in. Note that using MDUser will make RAUW much cheaper, since we're using the use-list infrastructure for most of them. If RAUW isn't showing up in a profile, I may skip this. Does this direction seem reasonable? Any major problems I've missed?

LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141013/631fd1f6/attachment.html>

Previous message: [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
Next message: [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the llvm-dev mailing list