[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR (original) (raw)
Eric Christopher echristo at gmail.com
Wed Oct 15 14:31:28 PDT 2014
- Previous message: [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
- Next message: [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, Oct 15, 2014 at 2:30 PM, Sean Silva <chisophugis at gmail.com> wrote:
On Mon, Oct 13, 2014 at 7:01 PM, Eric Christopher <echristo at gmail.com> wrote:
On Mon, Oct 13, 2014 at 6:59 PM, Sean Silva <chisophugis at gmail.com> wrote: > For those interested, I've attached some pie charts based on Duncan's > data > in one of the other posts; successive slides break down the usage > increasingly finely. To my understanding, they represent the number of > Value's (and subclasses) allocated. > > On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith > <dexonsmith at apple.com> wrote: >> >> In r219010, I merged integer and string fields into a single header >> field. By reducing the number of metadata operands used in debug info, >> this saved 2.2GB on an
llvm-lto
bootstrap. I've done some profiling >> of DWTAGs to see what parts of PR17891 and PR17892 to tackle next, and >> I've concluded that they will be insufficient. >> >> Instead, I'd like to implement a more aggressive plan, which as a >> side-effect cleans up the much "loved" debug info IR assembly syntax. >> >> At a high-level, the idea is to create distinct subclasses ofValue
>> for each debug info concept, starting with line table entries and >> moving >> on to the DIDescriptor hierarchy. By leveraging the use-list >> infrastructure for metadata operands -- i.e., only using value handles >> for non-metadata operands -- we'll improve memory usage and increase >> RAUW speed. >> >> My rough plan follows. I quote some numbers for memory savings below >> based on an -flto -g bootstrap ofllvm-lto
(i.e., runningllvm-lto
>> onllvm-lto.lto.bc
, an already-linked bitcode file dumped by ld64's >> -save-temps option) that currently peaks at 15.3GB. > > > Stupid question, but when I was working on LTO last Summer the primary > culprit for excessive memory use was due to us not being smart when > linking > the IR together (Espindola would know more details). Do we still have > that > problem? For starters, how does the memory usage of just llvm-link > compare > to the memory usage of the actual LTO run? If the issue I was seeing > last > Summer is still there, you should see that the invocation of llvm-link > is > actually the most memory-intensive part of the LTO step, by far. > This is vague. Could you be more specific on where you saw all of the memory? Runningllvm-link *.bc
would OOM a machine with 64GB of RAM (with -g; without -g it completed with much less). The increasing could be easily watched on the system "process monitor" in real time.
This is likely what we've already discussed and was handled a long while ago now.
-eric
-- Sean Silva
-eric > > Also, you seem to really like saying "peak" here. Is there a definite > peak? > When does it occur? > > >> >> >> 1. Introduce
MDUser
, which inherits fromUser
, and whoseUse
s >> must all be metadata. The cost per operand is 1 pointer, vs. 4 >> pointers in anMDNode
. >> >> 2. CreateMDLineTable
as the first subclass ofMDUser
. Use normal >> fields (notValue
s) for the line and column, and useUse
>> operands for the metadata operands. >> >> On x86-64, this will save 104B / line table entry. Linking >>llvm-lto
uses ~7M line-table entries, so this on its own saves >> ~700MB. >> >> >> Sketch of class definition: >> >> class MDLineTable : public MDUser { >> unsigned Line; >> unsigned Column; >> public: >> static MDLineTable *get(unsigned Line, unsigned Column, >> MDNode *Scope); >> static MDLineTable *getInlined(MDLineTable *Base, MDNode >> *Scope); >> static MDLineTable *getBase(MDLineTable *Inlined); >> >> unsigned getLine() const { return Line; } >> unsigned getColumn() const { return Column; } >> bool isInlined() const { return getNumOperands() == 2; } >> MDNode *getScope() const { return getOperand(0); } >> MDNode *getInlinedAt() const { return getOperand(1); } >> }; >> >> Proposed assembly syntax: >> >> ; Not inlined. >> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata >> !9) >> >> ; Inlined. >> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata >> !9, >> inlinedAt: metadata !10) >> >> ; Column defaulted to 0. >> !7 = metadata !MDLineTable(line: 45, scope: metadata !9) >> >> (What colour should that bike shed be?) >> >> 3. (Optional) RewriteDebugLoc
lookup tables. My profiling shows >> that we have 3.5M entries in theDebugLoc
side-vectors for 7M >> line >> table entries. The cost of these is ~180B each, for another >> ~600MB. >> >> If we integrate a side-table ofMDLineTable
s into its uniquing, >> the overhead is only ~12B / line table entry, or ~80MB. This saves >> 520MB. >> >> This is somewhat perpendicular to redesigning the metadata format, >> but IMO it's worth doing as soon as it's possible. >> >> 4. CreateGenericDebugMDNode
, a transitional subclass ofMDUser
>> through an intermediate classDebugMDNode
with an >> allocation-time-optionalCallbackVH
available for referencing >> non-metadata. ChangeDIDescriptor
to wrap aDebugMDNode
>> instead >> of anMDNode
. >> >> This saves another ~960MB, for a running total of ~2GB. > > > 2GB (out of 15.3GB i.e. ~13%) seems pretty pathetic savings when we have > a > single pie slice near 40% of the # of Value's allocated and another at > 21%. > Especially this being "step 4". > > As a rough back of the envelope calculation, dividing 15.3GB by ~24 > million > Values gives about 600 bytes per Value. That seems sort of excessive > (but is > it realistic?). All of the data types that you are proposing to shrink > fall > far short of this "average size", meaning that if you are trying to > reduce > memory usage, you might be looking in the wrong place. Something smells > fishy. At the very least, this would indicate that the real memory usage > is > elsewhere. > > A pie chart breaking down the total memory usage seems essential to have > here. > >> >> >> Proposed assembly syntax: >> >> !7 = metadata !GenericDebugMDNode(tag: DWTAGcompileunit, >> fields: "0\00clang >> 3.6\00...", >> operands: { metadata !8, ... >> }) >> >> !7 = metadata !GenericDebugMDNode(tag: DWTAGvariable, >> fields: "globalvar\00...", >> operands: { metadata !8, ... >> }, >> handle: i32* @globalvar) >> >> This syntax pulls the tag out of the current header-string, calls >> the rest of the header "fields", and includes the metadata operands >> in "operands". >> >> 5. Incrementally create subclasses ofDebugMDNode
, such as >>MDCompileUnit
andMDSubprogram
. Sub-classed nodes replace the >> "fields" and "operands" catch-alls with explicit names for each >> operand. >> >> Proposed assembly syntax: >> >> !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName: >> "foo", >> linkageName: "Z3foov", file: >> metadata >> !8, >> function: i32 (i32)* @foo) >> >> 6. Remove the dead code forGenericDebugMDNode
. >> >> 7. (Optional) RefactorDebugMDNode
sub-classes to minimize RAUW >> traffic during bitcode serialization. Now that metadata types are >> known, we can write debug info out in an order that makes it cheap >> to read back in. >> >> Note that usingMDUser
will make RAUW much cheaper, since we're >> using the use-list infrastructure for most of them. If RAUW isn't >> showing up in a profile, I may skip this. >> >> Does this direction seem reasonable? Any major problems I've missed? > > > You need more data. Right now you have essentially one data point, and > it's > not even clear what you measured really. If your goal is saving memory, > I > would expect at least a pie chart that breaks down LLVM's memory usage > (not > just # of allocations of different sorts; an approximation is fine, as > long > as you explain how you arrived at it and in what sense it approximates > the > true number). > > Do the numbers change significantly for different projects? (e.g. > Chromium > or Firefox or a kernel or a large app you have handy to compile with > LTO?). > If you have specific data you want (and a suggestion for how to gather > it), > I can also get your numbers for one of our internal games as well. > > Once you have some more data, then as a first step, I would like to see > an > analysis of how much we can "ideally" expect to gain (back of the > envelope > calculations == win). > > -- Sean Silva > >> >> _>> ________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
- Previous message: [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
- Next message: [LLVMdev] [RFC] Less memory and greater maintainability for debug info IR
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]