[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR (original) (raw)

Sean Silva chisophugis at gmail.com
Wed Oct 15 14:32:48 PDT 2014


On Wed, Oct 15, 2014 at 2:31 PM, Eric Christopher <echristo at gmail.com> wrote:

On Wed, Oct 15, 2014 at 2:30 PM, Sean Silva <chisophugis at gmail.com> wrote: > > > On Mon, Oct 13, 2014 at 7:01 PM, Eric Christopher <echristo at gmail.com> > wrote: >> >> On Mon, Oct 13, 2014 at 6:59 PM, Sean Silva <chisophugis at gmail.com> wrote: >> > For those interested, I've attached some pie charts based on Duncan's >> > data >> > in one of the other posts; successive slides break down the usage >> > increasingly finely. To my understanding, they represent the number of >> > Value's (and subclasses) allocated. >> > >> > On Mon, Oct 13, 2014 at 3:02 PM, Duncan P. N. Exon Smith >> > <dexonsmith at apple.com> wrote: >> >> >> >> In r219010, I merged integer and string fields into a single header >> >> field. By reducing the number of metadata operands used in debug info, >> >> this saved 2.2GB on an llvm-lto bootstrap. I've done some profiling >> >> of DWTAGs to see what parts of PR17891 and PR17892 to tackle next, and >> >> I've concluded that they will be insufficient. >> >> >> >> Instead, I'd like to implement a more aggressive plan, which as a >> >> side-effect cleans up the much "loved" debug info IR assembly syntax. >> >> >> >> At a high-level, the idea is to create distinct subclasses of Value >> >> for each debug info concept, starting with line table entries and >> >> moving >> >> on to the DIDescriptor hierarchy. By leveraging the use-list >> >> infrastructure for metadata operands -- i.e., only using value handles >> >> for non-metadata operands -- we'll improve memory usage and increase >> >> RAUW speed. >> >> >> >> My rough plan follows. I quote some numbers for memory savings below >> >> based on an -flto -g bootstrap of llvm-lto (i.e., running llvm-lto >> >> on llvm-lto.lto.bc, an already-linked bitcode file dumped by ld64's >> >> -save-temps option) that currently peaks at 15.3GB. >> > >> > >> > Stupid question, but when I was working on LTO last Summer the primary >> > culprit for excessive memory use was due to us not being smart when >> > linking >> > the IR together (Espindola would know more details). Do we still have >> > that >> > problem? For starters, how does the memory usage of just llvm-link >> > compare >> > to the memory usage of the actual LTO run? If the issue I was seeing >> > last >> > Summer is still there, you should see that the invocation of llvm-link >> > is >> > actually the most memory-intensive part of the LTO step, by far. >> > >> >> This is vague. Could you be more specific on where you saw all of the >> memory? > > > Running llvm-link *.bc would OOM a machine with 64GB of RAM (with -g; > without -g it completed with much less). The increasing could be easily > watched on the system "process monitor" in real time. >

This is likely what we've already discussed and was handled a long while ago now. I was reading the thread in sequential order (and replying without finishing). derp.

-- Sean Silva

-eric

> -- Sean Silva > >> >> >> -eric >> >> > >> > Also, you seem to really like saying "peak" here. Is there a definite >> > peak? >> > When does it occur? >> > >> > >> >> >> >> >> >> 1. Introduce MDUser, which inherits from User, and whose Uses >> >> must all be metadata. The cost per operand is 1 pointer, vs. 4 >> >> pointers in an MDNode. >> >> >> >> 2. Create MDLineTable as the first subclass of MDUser. Use normal >> >> fields (not Values) for the line and column, and use Use >> >> operands for the metadata operands. >> >> >> >> On x86-64, this will save 104B / line table entry. Linking >> >> llvm-lto uses ~7M line-table entries, so this on its own saves >> >> ~700MB. >> >> >> >> >> >> Sketch of class definition: >> >> >> >> class MDLineTable : public MDUser { >> >> unsigned Line; >> >> unsigned Column; >> >> public: >> >> static MDLineTable *get(unsigned Line, unsigned Column, >> >> MDNode *Scope); >> >> static MDLineTable *getInlined(MDLineTable *Base, MDNode >> >> *Scope); >> >> static MDLineTable *getBase(MDLineTable *Inlined); >> >> >> >> unsigned getLine() const { return Line; } >> >> unsigned getColumn() const { return Column; } >> >> bool isInlined() const { return getNumOperands() == 2; } >> >> MDNode *getScope() const { return getOperand(0); } >> >> MDNode *getInlinedAt() const { return getOperand(1); } >> >> }; >> >> >> >> Proposed assembly syntax: >> >> >> >> ; Not inlined. >> >> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata >> >> !9) >> >> >> >> ; Inlined. >> >> !7 = metadata !MDLineTable(line: 45, column: 7, scope: metadata >> >> !9, >> >> inlinedAt: metadata !10) >> >> >> >> ; Column defaulted to 0. >> >> !7 = metadata !MDLineTable(line: 45, scope: metadata !9) >> >> >> >> (What colour should that bike shed be?) >> >> >> >> 3. (Optional) Rewrite DebugLoc lookup tables. My profiling shows >> >> that we have 3.5M entries in the DebugLoc side-vectors for 7M >> >> line >> >> table entries. The cost of these is ~180B each, for another >> >> ~600MB. >> >> >> >> If we integrate a side-table of MDLineTables into its uniquing, >> >> the overhead is only ~12B / line table entry, or ~80MB. This saves >> >> 520MB. >> >> >> >> This is somewhat perpendicular to redesigning the metadata format, >> >> but IMO it's worth doing as soon as it's possible. >> >> >> >> 4. Create GenericDebugMDNode, a transitional subclass of MDUser >> >> through an intermediate class DebugMDNode with an >> >> allocation-time-optional CallbackVH available for referencing >> >> non-metadata. Change DIDescriptor to wrap a DebugMDNode >> >> instead >> >> of an MDNode. >> >> >> >> This saves another ~960MB, for a running total of ~2GB. >> > >> > >> > 2GB (out of 15.3GB i.e. ~13%) seems pretty pathetic savings when we have >> > a >> > single pie slice near 40% of the # of Value's allocated and another at >> > 21%. >> > Especially this being "step 4". >> > >> > As a rough back of the envelope calculation, dividing 15.3GB by ~24 >> > million >> > Values gives about 600 bytes per Value. That seems sort of excessive >> > (but is >> > it realistic?). All of the data types that you are proposing to shrink >> > fall >> > far short of this "average size", meaning that if you are trying to >> > reduce >> > memory usage, you might be looking in the wrong place. Something smells >> > fishy. At the very least, this would indicate that the real memory usage >> > is >> > elsewhere. >> > >> > A pie chart breaking down the total memory usage seems essential to have >> > here. >> > >> >> >> >> >> >> Proposed assembly syntax: >> >> >> >> !7 = metadata !GenericDebugMDNode(tag: DWTAGcompileunit, >> >> fields: "0\00clang >> >> 3.6\00...", >> >> operands: { metadata !8, ... >> >> }) >> >> >> >> !7 = metadata !GenericDebugMDNode(tag: DWTAGvariable, >> >> fields: "globalvar\00...", >> >> operands: { metadata !8, ... >> >> }, >> >> handle: i32* @globalvar) >> >> >> >> This syntax pulls the tag out of the current header-string, calls >> >> the rest of the header "fields", and includes the metadata operands >> >> in "operands". >> >> >> >> 5. Incrementally create subclasses of DebugMDNode, such as >> >> MDCompileUnit and MDSubprogram. Sub-classed nodes replace the >> >> "fields" and "operands" catch-alls with explicit names for each >> >> operand. >> >> >> >> Proposed assembly syntax: >> >> >> >> !7 = metadata !MDSubprogram(line: 45, name: "foo", displayName: >> >> "foo", >> >> linkageName: "Z3foov", file: >> >> metadata >> >> !8, >> >> function: i32 (i32)* @foo) >> >> >> >> 6. Remove the dead code for GenericDebugMDNode. >> >> >> >> 7. (Optional) Refactor DebugMDNode sub-classes to minimize RAUW >> >> traffic during bitcode serialization. Now that metadata types are >> >> known, we can write debug info out in an order that makes it cheap >> >> to read back in. >> >> >> >> Note that using MDUser will make RAUW much cheaper, since we're >> >> using the use-list infrastructure for most of them. If RAUW isn't >> >> showing up in a profile, I may skip this. >> >> >> >> Does this direction seem reasonable? Any major problems I've missed? >> > >> > >> > You need more data. Right now you have essentially one data point, and >> > it's >> > not even clear what you measured really. If your goal is saving memory, >> > I >> > would expect at least a pie chart that breaks down LLVM's memory usage >> > (not >> > just # of allocations of different sorts; an approximation is fine, as >> > long >> > as you explain how you arrived at it and in what sense it approximates >> > the >> > true number). >> > >> > Do the numbers change significantly for different projects? (e.g. >> > Chromium >> > or Firefox or a kernel or a large app you have handy to compile with >> > LTO?). >> > If you have specific data you want (and a suggestion for how to gather >> > it), >> > I can also get your numbers for one of our internal games as well. >> > >> > Once you have some more data, then as a first step, I would like to see >> > an >> > analysis of how much we can "ideally" expect to gain (back of the >> > envelope >> > calculations == win). >> > >> > -- Sean Silva >> > >> >> >> >> _>> >> ________________________ >> >> LLVM Developers mailing list >> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141015/ff744eac/attachment.html>



More information about the llvm-dev mailing list