[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR (original) (raw)
Sean Silva chisophugis at gmail.com
Mon Oct 13 18:59:48 PDT 2014
- Previous message: [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
- Next message: [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
For those interested, I've attached some pie charts based on Duncan's data in one of the other posts; successive slides break down the usage increasingly finely. To my understanding, they represent the number of Value's (and subclasses) allocated.
On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith < dexonsmith at apple.com> wrote:
In r219010, I merged integer and string fields into a single header field. By reducing the number of metadata operands used in debug info, this saved 2.2GB on an
llvm-lto
bootstrap. I've done some profiling of DWTAGs to see what parts of PR17891 and PR17892 to tackle next, and I've concluded that they will be insufficient.Instead, I'd like to implement a more aggressive plan, which as a side-effect cleans up the much "loved" debug info IR assembly syntax. At a high-level, the idea is to create distinct subclasses of
Value
for each debug info concept, starting with line table entries and moving on to the DIDescriptor hierarchy. By leveraging the use-list infrastructure for metadata operands -- i.e., only using value handles for non-metadata operands -- we'll improve memory usage and increase RAUW speed. My rough plan follows. I quote some numbers for memory savings below based on an -flto -g bootstrap ofllvm-lto
(i.e., runningllvm-lto
onllvm-lto.lto.bc
, an already-linked bitcode file dumped by ld64's -save-temps option) that currently peaks at 15.3GB.
Stupid question, but when I was working on LTO last Summer the primary culprit for excessive memory use was due to us not being smart when linking the IR together (Espindola would know more details). Do we still have that problem? For starters, how does the memory usage of just llvm-link compare to the memory usage of the actual LTO run? If the issue I was seeing last Summer is still there, you should see that the invocation of llvm-link is actually the most memory-intensive part of the LTO step, by far.
Also, you seem to really like saying "peak" here. Is there a definite peak? When does it occur?
1. Introduce
MDUser
, which inherits fromUser
, and whoseUse
s must all be metadata. The cost per operand is 1 pointer, vs. 4 pointers in anMDNode
. 2. CreateMDLineTable
as the first subclass ofMDUser
. Use normal fields (notValue
s) for the line and column, and useUse
operands for the metadata operands. On x86-64, this will save 104B / line table entry. Linkingllvm-lto
uses ~7M line-table entries, so this on its own saves ~700MB.
Sketch of class definition:
class MDLineTable : public MDUser { unsigned Line; unsigned Column; public: static MDLineTable *get(unsigned Line, unsigned Column, MDNode *Scope); static MDLineTable *getInlined(MDLineTable *Base, MDNode *Scope); static MDLineTable *getBase(MDLineTable *Inlined); unsigned getLine() const { return Line; } unsigned getColumn() const { return Column; } bool isInlined() const { return getNumOperands() == 2; } MDNode *getScope() const { return getOperand(0); } MDNode *getInlinedAt() const { return getOperand(1); } }; Proposed assembly syntax: ; Not inlined. !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9) ; Inlined. !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata !9, inlinedAt: metadata !10) ; Column defaulted to 0. !7 = metadata !MDLineTable(line: 45, scope: metadata !9) (What colour should that bike shed be?) 3. (Optional) Rewrite
DebugLoc
lookup tables. My profiling shows that we have 3.5M entries in theDebugLoc
side-vectors for 7M line table entries. The cost of these is ~180B each, for another ~600MB. If we integrate a side-table ofMDLineTable
s into its uniquing, the overhead is only ~12B / line table entry, or ~80MB. This saves 520MB. This is somewhat perpendicular to redesigning the metadata format, but IMO it's worth doing as soon as it's possible. 4. CreateGenericDebugMDNode
, a transitional subclass ofMDUser
through an intermediate classDebugMDNode
with an allocation-time-optionalCallbackVH
available for referencing non-metadata. ChangeDIDescriptor
to wrap aDebugMDNode
instead of anMDNode
. This saves another ~960MB, for a running total of ~2GB.
2GB (out of 15.3GB i.e. ~13%) seems pretty pathetic savings when we have a single pie slice near 40% of the # of Value's allocated and another at 21%. Especially this being "step 4".
As a rough back of the envelope calculation, dividing 15.3GB by ~24 million Values gives about 600 bytes per Value. That seems sort of excessive (but is it realistic?). All of the data types that you are proposing to shrink fall far short of this "average size", meaning that if you are trying to reduce memory usage, you might be looking in the wrong place. Something smells fishy. At the very least, this would indicate that the real memory usage is elsewhere.
A pie chart breaking down the total memory usage seems essential to have here.
Proposed assembly syntax: !7 = metadata !GenericDebugMDNode(tag: DWTAGcompileunit, fields: "0\00clang 3.6\00...", operands: { metadata !8, ... }) !7 = metadata !GenericDebugMDNode(tag: DWTAGvariable, fields: "globalvar\00...", operands: { metadata !8, ... }, handle: i32* @globalvar) This syntax pulls the tag out of the current header-string, calls the rest of the header "fields", and includes the metadata operands in "operands". 5. Incrementally create subclasses of
DebugMDNode
, such asMDCompileUnit
andMDSubprogram
. Sub-classed nodes replace the "fields" and "operands" catch-alls with explicit names for each operand. Proposed assembly syntax: !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName: "foo", linkageName: "Z3foov", file: metadata !8, function: i32 (i32)* @foo) 6. Remove the dead code forGenericDebugMDNode
. 7. (Optional) RefactorDebugMDNode
sub-classes to minimize RAUW traffic during bitcode serialization. Now that metadata types are known, we can write debug info out in an order that makes it cheap to read back in. Note that usingMDUser
will make RAUW much cheaper, since we're using the use-list infrastructure for most of them. If RAUW isn't showing up in a profile, I may skip this. Does this direction seem reasonable? Any major problems I've missed?
You need more data. Right now you have essentially one data point, and it's not even clear what you measured really. If your goal is saving memory, I would expect at least a pie chart that breaks down LLVM's memory usage (not just # of allocations of different sorts; an approximation is fine, as long as you explain how you arrived at it and in what sense it approximates the true number).
Do the numbers change significantly for different projects? (e.g. Chromium or Firefox or a kernel or a large app you have handy to compile with LTO?). If you have specific data you want (and a suggestion for how to gather it), I can also get your numbers for one of our internal games as well.
Once you have some more data, then as a first step, I would like to see an analysis of how much we can "ideally" expect to gain (back of the envelope calculations == win).
-- Sean Silva
LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141013/b1da4b87/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: DebugInfoSize.pdf Type: application/pdf Size: 108040 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141013/b1da4b87/attachment.pdf>
- Previous message: [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
- Next message: [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]