[LLVMdev] GlobalsModRef (and thus LTO) is completely broken (original) (raw)

Xinliang David Li xinliangli at gmail.com
Tue Jul 21 15:50:59 PDT 2015


reg_values is a file static variable in cselib.c -- so you might be able to reproduce the issue with a smaller reproducible.

David

On Tue, Jul 21, 2015 at 3:43 PM, Michael Zolotukhin <mzolotukhin at apple.com> wrote:

> On Jul 21, 2015, at 3:34 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > > Based on function names and structures, this is some version of GCC :) Yep, it’s 403.gcc from SPEC2006 (I thought I mentioned it - but probably I only did that on IRC). > Any way you can post the entire .ll file? It’s an LTO build, so it’d be troublesome.. I tried to print the module in lldb, but after a several minutes it hasn’t even finished printing globals (which I assume is the very beginning). > > Because it's globalsmodref, it's hard to debug without the other > functions, since it goes over all the functions to determine address > takenness, etc :) Yep, I understand that - I’m still in debugger though, so if you’re interested in some particular data, I can try collecting it. I can try to dump the module too, but it might be not-practical in the end:) Michael > > > On Tue, Jul 21, 2015 at 3:23 PM, Michael Zolotukhin > <mzolotukhin at apple.com> wrote: >> Hi Chandler, >> >> We observed some regressions in our regular testing (despite I saw nothing >> suspicious in my runs). I did accurate investigation and was able to >> reproduce and track down the regression. >> I found the exact request to GlobalsModRef that results in the performance >> loss (I added a limit on number of requests into the implementation and >> bisected the number to find the interesting request). >> >> Here are the details: >> >> We’re checking following two locations: >> >> (lldb) p ((llvm::Instruction*)(LocA.Ptr))->dump() >> %arrayidx.i = getelementptr inbounds [1 x %struct.eltlist*], [1 x >> %struct.eltlist*]* %te.i, i64 0, i64 %indvars.iv.i >> (lldb) p ((llvm::Instruction*)(LocB.Ptr))->dump() >> @regvalues = internal unnamedaddr global %struct.varrayheadtag* null, >> align 8 >> >> and the function in question is “cselibinit”: >> (lldb) p >> ((llvm::Instruction*)(LocA.Ptr))->getParent()->getParent()->getName() >> (llvm::StringRef) $3 = (Data = "cselibinit", Length = 11) >> >> Corresponding underlying values: >> (lldb) p UV2->dump() >> @regvalues = internal unnamedaddr global %struct.varrayheadtag* null, >> align 8 >> (lldb) p UV1->dump() >> %32 = load %struct.varrayheadtag*, %struct.varrayheadtag** @regvalues, >> align 8, !tbaa !2 >> >> Backtrace: >> (lldb) bt >> * thread #1: tid = 0x120baaf, 0x00000001038b752a libLTO.dylib(anonymous_ _>> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10,_ _>> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at_ _>> GlobalsModRef.cpp:519, queue = 'com.apple.main-thread', stop reason =_ _step_ _>> over_ _>> * frame #0: 0x00000001038b752a libLTO.dylib(anonymous >> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10, >> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at >> GlobalsModRef.cpp:519 >> frame #1: 0x00000001038b82f7 libLTO.dylibnon-virtual thunk to_ _>> (anonymous namespace)::GlobalsModRef::alias(this=0x000000010eba5c30,_ _>> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 55 at_ _>> GlobalsModRef.cpp:562_ _>> frame #2: 0x00000001038d6aa8_ _>> libLTO.dylibllvm::AliasAnalysis::getModRefInfo(this=0x000000010eba5c30, >> S=0x000000010a1a22e0, Loc=0x00007fff5fbf6268) + 120 at AliasAnalysis.cpp:288 >> frame #3: 0x0000000103a0a814 >> libLTO.dylibllvm::MemoryDependenceAnalysis::getPointerDependencyFrom(this=0x000000010e6cf0c0,_ _>> MemLoc=0x00007fff5fbf6268, isLoad=true,_ _ScanIt=llvm::BasicBlock::iterator at_ _>> 0x00007fff5fbf4390, BB=0x000000010a19ffa0,_ _QueryInst=0x000000010a1a20c8) +_ _>> 1908 at MemoryDependenceAnalysis.cpp:570_ _>> frame #4: 0x0000000103a0ffa5_ _>>_ _libLTO.dylibllvm::MemoryDependenceAnalysis::GetNonLocalInfoForBlock(this=0x000000010e6cf0c0, >> QueryInst=0x000000010a1a20c8, Loc=0x00007fff5fbf6268, isLoad=true, >> BB=0x000000010a19ffa0, Cache=0x0000000100f9d568, NumSortedEntries=0) + 2165 >> at MemoryDependenceAnalysis.cpp:965 >> frame #5: 0x0000000103a0e3a9 >> libLTO.dylibllvm::MemoryDependenceAnalysis::getNonLocalPointerDepFromBB(this=0x000000010e6cf0c0,_ _>> QueryInst=0x000000010a1a20c8, Pointer=0x00007fff5fbf62a8,_ _>> Loc=0x00007fff5fbf6268, isLoad=true, StartBB=0x000000010a19ffa0,_ _>> Result=0x00007fff5fbf6bf0, Visited=0x00007fff5fbf6208,_ _SkipFirstBlock=false)_ _>> + 5897 at MemoryDependenceAnalysis.cpp:1200_ _>> frame #6: 0x0000000103a0cb3b_ _>>_ _libLTO.dylibllvm::MemoryDependenceAnalysis::getNonLocalPointerDependency(this=0x000000010e6cf0c0, >> QueryInst=0x000000010a1a20c8, Result=0x00007fff5fbf6bf0) + 635 at >> MemoryDependenceAnalysis.cpp:911 >> frame #7: 0x000000010340c5b5 libLTO.dylib(anonymous_ _>> namespace)::GVN::processNonLocalLoad(this=0x000000010e6ce680,_ _>> LI=0x000000010a1a20c8) + 101 at GVN.cpp:1706_ _>> frame #8: 0x0000000103408eef libLTO.dylib(anonymous >> namespace)::GVN::processLoad(this=0x000000010e6ce680, L=0x000000010a1a20c8) >> + 1551 at GVN.cpp:1905 >> frame #9: 0x00000001034080fd libLTO.dylib(anonymous_ _>> namespace)::GVN::processInstruction(this=0x000000010e6ce680,_ _>> I=0x000000010a1a20c8) + 397 at GVN.cpp:2220_ _>> frame #10: 0x0000000103407d1b libLTO.dylib(anonymous >> namespace)::GVN::processBlock(this=0x000000010e6ce680, >> BB=0x000000010a19ffa0) + 251 at GVN.cpp:2394 >> frame #11: 0x0000000103401755 libLTO.dylib(anonymous_ _>> namespace)::GVN::iterateOnFunction(this=0x000000010e6ce680,_ _>> F=0x00000001085f69f8) + 1541 at GVN.cpp:2677_ _>> frame #12: 0x0000000103400fef libLTO.dylib(anonymous >> namespace)::GVN::runOnFunction(this=0x000000010e6ce680, >> F=0x00000001085f69f8) + 623 at GVN.cpp:2352 >> frame #13: 0x00000001027cd05b >> libLTO.dylibllvm::FPPassManager::runOnFunction(this=0x000000010eba6810,_ _>> F=0x00000001085f69f8) + 427 at LegacyPassManager.cpp:1520_ _>> frame #14: 0x00000001027cd375_ _>> libLTO.dylibllvm::FPPassManager::runOnModule(this=0x000000010eba6810, >> M=0x000000010115c5f0) + 117 at LegacyPassManager.cpp:1540 >> frame #15: 0x00000001027cdda1 libLTO.dylib(anonymous_ _>> namespace)::MPPassManager::runOnModule(this=0x000000010e6cbaf0,_ _>> M=0x000000010115c5f0) + 1409 at LegacyPassManager.cpp:1596_ _>> frame #16: 0x00000001027cd636_ _>> libLTO.dylibllvm::legacy::PassManagerImpl::run(this=0x000000010e6cb740, >> M=0x000000010115c5f0) + 310 at LegacyPassManager.cpp:1698 >> frame #17: 0x00000001027ce521 >> libLTO.dylib`llvm::legacy::PassManager::run(this=0x00007fff5fbf82b8, >> M=0x000000010115c5f0) + 33 at LegacyPassManager.cpp:1729 >> >> >> The function body is in the attached file. >> >> >> >> GlobalsModRef reports NoAlias for this pair, here: >> if (GV1 || GV2) { >> // If the global's address is taken, pretend we don't know it's a >> pointer to >> // the global. >> if (GV1 && !NonAddressTakenGlobals.count(GV1)) >> GV1 = nullptr; >> if (GV2 && !NonAddressTakenGlobals.count(GV2)) >> GV2 = nullptr; >> >> // If the two pointers are derived from two different non-addr-taken >> // globals, or if one is and the other isn't, we know these can't alias. >> if ((GV1 || GV2) && GV1 != GV2) >> return NoAlias; >> >> // Otherwise if they are both derived from the same addr-taken global, >> we >> // can't know the two accesses don't overlap. >> } >> >> >> Thanks, >> Michael >> >> On Jul 17, 2015, at 12:18 PM, Chandler Carruth <chandlerc at gmail.com> wrote: >> >> On Fri, Jul 17, 2015 at 9:13 AM Evgeny Astigeevich >> <evgeny.astigeevich at arm.com> wrote: >>> >>> It’s Dhrystone. >> >> Dhrystone has historically not been a good indicator of real-world >> performance fluctuations, especially at this small of a shift. >> >> I'd like to see if we see any fluctuation on larger and more realistic >> application benchmarks. One advantage of the flag being set is that we >> should get runs from folks who have automatic builds and runs periodically >> from trunk. Those should help give an accurate picture. >> >>> >>> >>> >>> From: Chandler Carruth [mailto:chandlerc at gmail.com] >>> Sent: 17 July 2015 16:10 >>> >>> >>> To: Evgeny Astigeevich; Chandler Carruth >>> Cc: LLVM Developers Mailing List >>> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >>> >>> >>> >>> Can you say what Benchmark or give a test case so we understand the nature >>> of the regression? As Gerolf said, that will be important to understand what >>> is best to do. >>> >>> >>> >>> On Fri, Jul 17, 2015, 06:43 Evgeny Astigeevich >>> <Evgeny.Astigeevich at arm.com> wrote: >>> >>> Yes, the regression is stable. I double checked this. A full benchmark run >>> consists of at least 10 sub-runs to validate the score. >>> >>> I also checked if there were regressions of this benchmark across >>> different ARM hardware versions. I found all regressions of this benchmark >>> were in range 1.6%-2%. >>> >>> >>> >>> Kind regards, >>> >>> Evgeny Astigeevich >>> >>> >>> >>> From: Chandler Carruth [mailto:chandlerc at gmail.com] >>> Sent: 17 July 2015 07:52 >>> To: Evgeny Astigeevich; Chandler Carruth >>> Cc: LLVM Developers Mailing List; Michael Zolotukhin >>> >>> >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >>> >>> >>> >>> Hey, thanks for benchmarking. >>> >>> >>> >>> How stable is the 2% regression? >>> >>> >>> >>> Michael ran some benchmarks with GlobalsModRef completely disabled and the >>> only differences were in the noise. This was a complete spec2k6 run along >>> with some others. Based on the number of benchmarks run there, I'm going to >>> go ahead and submit these patches, but if you can clarify the impact here, >>> we can look at potentially some other tradeoff. I'm not particularly set on >>> one set of defaults, etc, I just don't want to keep patches held up based on >>> that. We can flip the default back and forth as new data arrives. >>> >>> >>> >>> On Thu, Jul 16, 2015 at 12:23 PM Evgeny Astigeevich >>> <evgeny.astigeevich at arm.com> wrote: >>> >>> Hi Chandler, >>> >>> >>> >>> I ran couple benchmarks with LTO turned on and your patches on ARM >>> hardware. >>> >>> There were no performance degradation of one benchmark and 2% slowdown of >>> another benchmark. >>> >>> >>> >>> Kind regards, >>> >>> Evgeny Astigeevich >>> >>> >>> >>> >>> >>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On >>> Behalf Of Evgeny Astigeevich >>> Sent: 15 July 2015 15:12 >>> >>> >>> To: 'Chandler Carruth'; Gerolf Hoflehner >>> Cc: LLVM Developers Mailing List >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >>> >>> >>> >>> Hi Chandler, >>> >>> >>> >>> I would like to run some benchmarks on ARM hardware and to look at impact >>> of your patches on LTO. >>> >>> >>> >>> Kind regards, >>> >>> Evgeny Astigeevich >>> >>> >>> >>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On >>> Behalf Of Chandler Carruth >>> >>> >>> Sent: 15 July 2015 10:45 >>> To: Chandler Carruth; Gerolf Hoflehner >>> Cc: LLVM Developers Mailing List >>> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken >>> >>> >>> >>> I've fixed the obvious bugs I spotted in r242281. These should be pure >>> correctness improvements. >>> >>> >>> >>> I've sent the two patches I'm imagining to address the core issue here: >>> >>> http://reviews.llvm.org/D11213 >>> >>> http://reviews.llvm.org/D11214 >>> >>> >>> >>> Currently, I have the unsafe alias results disabled by default, but with a >>> flag that can re-enable them if needed. I don't feel really strongly about >>> which way the default is set -- but that may be because I don't have lots of >>> users relying on LTO. I'll let others indicate which way they would be most >>> comfortable. >>> >>> >>> >>> Some IRC conversations indicated that early benchmark results with GMR >>> completely disabled weren't showing really significant swings, so maybe this >>> relatively small reduction in power of GMR won't be too problematic for >>> folks. Either way, I'm open to the different approaches. It's D11214 that I >>> care a lot about. =] >>> >>> >>> >>> >>> >>> Thanks for all the thoughts here! >>> >>> -Chandler >>> >>> >>> >>> On Tue, Jul 14, 2015 at 11:25 PM Chandler Carruth <chandlerc at gmail.com_ _> >>> wrote: >>> >>> Replying here, but several of the questions raised boil down to "couldn't >>> you make the usage of GetUnderlyingObject conservatively correct?". I'll try >>> and address that. >>> >>> >>> >>> I think this is the right approach, but I think it is very hard to do >>> without effectively disabling this part of GlobalsModRef. That is, the easy >>> ways are likely to fire very frequently IMO. >>> >>> >>> >>> The core idea is to detect a "no information" state coming out of >>> GetUnderlyingObject (likely be providing a custom version just for >>> GlobalsModRef and tailored to its use cases). This is particularly effective >>> at avoiding the problems with the recursion limit. But let's look at what >>> cases we wouldn't return that. Here are the cases I see when I thought >>> about this last night with Hal, roughly in descending likelihood I would >>> guess: >>> >>> >>> >>> 1) We detect some global or an alloca. In that case, even BasicAA would be >>> sufficient to provide no-alias. GMR shouldn't be relevant. >>> >>> >>> >>> 2) We detect a phi, select, or inttoptr, and stop there. >>> >>> >>> >>> 3) We detect a load and stop there. >>> >>> >>> >>> 4) We detect a return from a function. >>> >>> >>> >>> 5) We detect an argument to the function. >>> >>> >>> >>> I strongly suspect the vast majority of queries hit #1. That's why BasicAA >>> is so effective. Both #4 and #5 I think are actually reasonable places for >>> GMR to potentially say "no-alias" and provide useful definitive information. >>> But I also suspect these are the least common. >>> >>> >>> >>> So let's look at #2 and #3 because I think they're interesting. For these, >>> I think it is extremely hard to return "no-alias". It seems extremely easy >>> for a reasonable and innocuous change to the IR to introduce a phi or a >>> select into one side of the GetUnderlyingObject but not the other. If that >>> ever happens, we can't return "no-alias" for #2, or we need to add really >>> expensive updates. It also seems reasonable (if not as likely) to want >>> adding a store and load to the IR to not trigger a miscompile. If it is >>> possible for a valid optimization pass to do reg2mem on an SSA value, then >>> that could happen to only one side of the paired GetUnderlyingObject and >>> break GMR with #3. If that seems like an unreasonable thing to do, consider >>> loop re-rolling or other transformations which may need to take things in >>> SSA form at move them out of SSA form. Even if we then try immediately to >>> put it back into SSA form, before we do that we create a point where GMR >>> cannot correctly return no-alias. >>> >>> >>> >>> So ultimately, I don't think we want to rely on GMR returning "no-alias" >>> for either #2 or #3 because of the challenge of actually updating it in all >>> of the circumstances that could break them. That means that only #4 and #5 >>> are going to return "no-alias" usefully. And even then, function inlining >>> and function outlining both break #4 and #5, so you have to preclude those >>> transforms while GMR is active. And I have serious doubts about these >>> providing enough value to be worth the cost. >>> >>> >>> >>> >>> >>> I think the better way to approach this is the other way around. Rather >>> than doing a backwards analysis to see if one location reaches and global >>> and the other location doesn't reach a global, I think it would be much more >>> effective to re-cast this as a forward analysis that determines all the >>> memory locations in a function that come from outside the function, and use >>> that to drive the no-alias responses. >>> >>> >>> >>> >>> >>> On Tue, Jul 14, 2015 at 12:12 PM Gerolf Hoflehner <_ _ghoflehner at apple.com> >>> wrote: >>> >>> I wouldn’t be willing to give up performance for hypothetical issues. >>> Please protect all your changes with options. For some of your concerns it >>> is probably hard to provide a test case that shows an/the actual issue. >>> >>> >>> >>> I certainly agree that it will be very hard to provide a test case and >>> extremely rare to see this in the wild for most of these issues. As long as >>> I can remove the problematic update API we currently have (which as its an >>> API change can't really be put behind flags), I'm happy to have flags >>> control whether or not GMR uses the unsound / stale information to try to >>> answer alias queries. Do you have any opinion about what the default value >>> of the flags should be? >>> >>> >>> >>> I'll go ahead and prepare the patches, as it seems like we're all ending >>> up in the same position, and just wondering about the precise tradeoffs we >>> want to settle on. >>> >>> >>> _>>> ________________________ >>> LLVM Developers mailing list >>> >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >>> _>>> ________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> _>>> ________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> _>>> ________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> _>>> ________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> _>> ________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> _>> ________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>


LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150721/0520ab1e/attachment.html>



More information about the llvm-dev mailing list