(original) (raw)

reg\_values is a file static variable in cselib.c -- so you might be able to reproduce the issue with a smaller reproducible.

David

On Tue, Jul 21, 2015 at 3:43 PM, Michael Zolotukhin <mzolotukhin@apple.com> wrote:

\> On Jul 21, 2015, at 3:34 PM, Daniel Berlin <dberlin@dberlin.org> wrote:
\>
\> Based on function names and structures, this is some version of GCC :)
Yep, it’s 403.gcc from SPEC2006 (I thought I mentioned it - but probably I only did that on IRC).

\> Any way you can post the entire .ll file?
It’s an LTO build, so it’d be troublesome.. I tried to print the module in lldb, but after a several minutes it hasn’t even finished printing globals (which I assume is the very beginning).

\>
\> Because it's globalsmodref, it's hard to debug without the other
\> functions, since it goes over all the functions to determine address
\> takenness, etc :)
Yep, I understand that - I’m still in debugger though, so if you’re interested in some particular data, I can try collecting it. I can try to dump the module too, but it might be not-practical in the end:)

Michael

>
\>
\> On Tue, Jul 21, 2015 at 3:23 PM, Michael Zolotukhin
\> <mzolotukhin@apple.com> wrote:
\>> Hi Chandler,
\>>
\>> We observed some regressions in our regular testing (despite I saw nothing
\>> suspicious in my runs). I did accurate investigation and was able to
\>> reproduce and track down the regression.
\>> I found the exact request to GlobalsModRef that results in the performance
\>> loss (I added a limit on number of requests into the implementation and
\>> bisected the number to find the interesting request).
\>>
\>> Here are the details:
\>>
\>> We’re checking following two locations:
\>>
\>> (lldb) p ((llvm::Instruction\*)(LocA.Ptr))->dump()
\>> %arrayidx.i = getelementptr inbounds \[1 x %struct.elt\_list\*\], \[1 x
\>> %struct.elt\_list\*\]\* %te.i, i64 0, i64 %indvars.iv.i
\>> (lldb) p ((llvm::Instruction\*)(LocB.Ptr))->dump()
\>> @reg\_values = internal unnamed\_addr global %struct.varray\_head\_tag\* null,
\>> align 8
\>>
\>> and the function in question is “cselib\_init”:
\>> (lldb) p
\>> ((llvm::Instruction\*)(LocA.Ptr))->getParent()->getParent()->getName()
\>> (llvm::StringRef) $3 = (Data = "cselib\_init", Length = 11)
\>>
\>> Corresponding underlying values:
\>> (lldb) p UV2->dump()
\>> @reg\_values = internal unnamed\_addr global %struct.varray\_head\_tag\* null,
\>> align 8
\>> (lldb) p UV1->dump()
\>> %32 = load %struct.varray\_head\_tag\*, %struct.varray\_head\_tag\*\* @reg\_values,
\>> align 8, !tbaa !2
\>>
\>> Backtrace:
\>> (lldb) bt
\>> \* thread #1: tid = 0x120baaf, 0x00000001038b752a libLTO.dylib\`(anonymous
\>> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10,
\>> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at
\>> GlobalsModRef.cpp:519, queue = 'com.apple.main-thread', stop reason = step
\>> over
\>> \* frame #0: 0x00000001038b752a libLTO.dylib\`(anonymous
\>> namespace)::GlobalsModRef::alias(this=0x000000010eba5c10,
\>> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 570 at
\>> GlobalsModRef.cpp:519
\>> frame #1: 0x00000001038b82f7 libLTO.dylib\`non-virtual thunk to
\>> (anonymous namespace)::GlobalsModRef::alias(this=0x000000010eba5c30,
\>> LocA=0x00007fff5fbf4198, LocB=0x00007fff5fbf6268) + 55 at
\>> GlobalsModRef.cpp:562
\>> frame #2: 0x00000001038d6aa8
\>> libLTO.dylib\`llvm::AliasAnalysis::getModRefInfo(this=0x000000010eba5c30,
\>> S=0x000000010a1a22e0, Loc=0x00007fff5fbf6268) + 120 at AliasAnalysis.cpp:288
\>> frame #3: 0x0000000103a0a814
\>> libLTO.dylib\`llvm::MemoryDependenceAnalysis::getPointerDependencyFrom(this=0x000000010e6cf0c0,
\>> MemLoc=0x00007fff5fbf6268, isLoad=true, ScanIt=llvm::BasicBlock::iterator at
\>> 0x00007fff5fbf4390, BB=0x000000010a19ffa0, QueryInst=0x000000010a1a20c8) +
\>> 1908 at MemoryDependenceAnalysis.cpp:570
\>> frame #4: 0x0000000103a0ffa5
\>> libLTO.dylib\`llvm::MemoryDependenceAnalysis::GetNonLocalInfoForBlock(this=0x000000010e6cf0c0,
\>> QueryInst=0x000000010a1a20c8, Loc=0x00007fff5fbf6268, isLoad=true,
\>> BB=0x000000010a19ffa0, Cache=0x0000000100f9d568, NumSortedEntries=0) + 2165
\>> at MemoryDependenceAnalysis.cpp:965
\>> frame #5: 0x0000000103a0e3a9
\>> libLTO.dylib\`llvm::MemoryDependenceAnalysis::getNonLocalPointerDepFromBB(this=0x000000010e6cf0c0,
\>> QueryInst=0x000000010a1a20c8, Pointer=0x00007fff5fbf62a8,
\>> Loc=0x00007fff5fbf6268, isLoad=true, StartBB=0x000000010a19ffa0,
\>> Result=0x00007fff5fbf6bf0, Visited=0x00007fff5fbf6208, SkipFirstBlock=false)
\>> + 5897 at MemoryDependenceAnalysis.cpp:1200
\>> frame #6: 0x0000000103a0cb3b
\>> libLTO.dylib\`llvm::MemoryDependenceAnalysis::getNonLocalPointerDependency(this=0x000000010e6cf0c0,
\>> QueryInst=0x000000010a1a20c8, Result=0x00007fff5fbf6bf0) + 635 at
\>> MemoryDependenceAnalysis.cpp:911
\>> frame #7: 0x000000010340c5b5 libLTO.dylib\`(anonymous
\>> namespace)::GVN::processNonLocalLoad(this=0x000000010e6ce680,
\>> LI=0x000000010a1a20c8) + 101 at GVN.cpp:1706
\>> frame #8: 0x0000000103408eef libLTO.dylib\`(anonymous
\>> namespace)::GVN::processLoad(this=0x000000010e6ce680, L=0x000000010a1a20c8)
\>> + 1551 at GVN.cpp:1905
\>> frame #9: 0x00000001034080fd libLTO.dylib\`(anonymous
\>> namespace)::GVN::processInstruction(this=0x000000010e6ce680,
\>> I=0x000000010a1a20c8) + 397 at GVN.cpp:2220
\>> frame #10: 0x0000000103407d1b libLTO.dylib\`(anonymous
\>> namespace)::GVN::processBlock(this=0x000000010e6ce680,
\>> BB=0x000000010a19ffa0) + 251 at GVN.cpp:2394
\>> frame #11: 0x0000000103401755 libLTO.dylib\`(anonymous
\>> namespace)::GVN::iterateOnFunction(this=0x000000010e6ce680,
\>> F=0x00000001085f69f8) + 1541 at GVN.cpp:2677
\>> frame #12: 0x0000000103400fef libLTO.dylib\`(anonymous
\>> namespace)::GVN::runOnFunction(this=0x000000010e6ce680,
\>> F=0x00000001085f69f8) + 623 at GVN.cpp:2352
\>> frame #13: 0x00000001027cd05b
\>> libLTO.dylib\`llvm::FPPassManager::runOnFunction(this=0x000000010eba6810,
\>> F=0x00000001085f69f8) + 427 at LegacyPassManager.cpp:1520
\>> frame #14: 0x00000001027cd375
\>> libLTO.dylib\`llvm::FPPassManager::runOnModule(this=0x000000010eba6810,
\>> M=0x000000010115c5f0) + 117 at LegacyPassManager.cpp:1540
\>> frame #15: 0x00000001027cdda1 libLTO.dylib\`(anonymous
\>> namespace)::MPPassManager::runOnModule(this=0x000000010e6cbaf0,
\>> M=0x000000010115c5f0) + 1409 at LegacyPassManager.cpp:1596
\>> frame #16: 0x00000001027cd636
\>> libLTO.dylib\`llvm::legacy::PassManagerImpl::run(this=0x000000010e6cb740,
\>> M=0x000000010115c5f0) + 310 at LegacyPassManager.cpp:1698
\>> frame #17: 0x00000001027ce521
\>> libLTO.dylib\`llvm::legacy::PassManager::run(this=0x00007fff5fbf82b8,
\>> M=0x000000010115c5f0) + 33 at LegacyPassManager.cpp:1729
\>>
\>>
\>> The function body is in the attached file.
\>>
\>>
\>>
\>> GlobalsModRef reports NoAlias for this pair, here:
\>> if (GV1 || GV2) {
\>> // If the global's address is taken, pretend we don't know it's a
\>> pointer to
\>> // the global.
\>> if (GV1 && !NonAddressTakenGlobals.count(GV1))
\>> GV1 = nullptr;
\>> if (GV2 && !NonAddressTakenGlobals.count(GV2))
\>> GV2 = nullptr;
\>>
\>> // If the two pointers are derived from two different non-addr-taken
\>> // globals, or if one is and the other isn't, we know these can't alias.
\>> if ((GV1 || GV2) && GV1 != GV2)
\>> return NoAlias;
\>>
\>> // Otherwise if they are both derived from the same addr-taken global,
\>> we
\>> // can't know the two accesses don't overlap.
\>> }
\>>
\>>
\>> Thanks,
\>> Michael
\>>
\>> On Jul 17, 2015, at 12:18 PM, Chandler Carruth <chandlerc@gmail.com> wrote:
\>>
\>> On Fri, Jul 17, 2015 at 9:13 AM Evgeny Astigeevich
\>> <evgeny.astigeevich@arm.com> wrote:
\>>>
\>>> It’s Dhrystone.
\>>
\>> Dhrystone has historically not been a good indicator of real-world
\>> performance fluctuations, especially at this small of a shift.
\>>
\>> I'd like to see if we see any fluctuation on larger and more realistic
\>> application benchmarks. One advantage of the flag being set is that we
\>> should get runs from folks who have automatic builds and runs periodically
\>> from trunk. Those should help give an accurate picture.
\>>
\>>>
\>>>
\>>>
\>>> From: Chandler Carruth \[mailto:chandlerc@gmail.com\]
\>>> Sent: 17 July 2015 16:10
\>>>
\>>>
\>>> To: Evgeny Astigeevich; Chandler Carruth
\>>> Cc: LLVM Developers Mailing List
\>>>
\>>> Subject: Re: \[LLVMdev\] GlobalsModRef (and thus LTO) is completely broken
\>>>
\>>>
\>>>
\>>> Can you say what Benchmark or give a test case so we understand the nature
\>>> of the regression? As Gerolf said, that will be important to understand what
\>>> is best to do.
\>>>
\>>>
\>>>
\>>> On Fri, Jul 17, 2015, 06:43 Evgeny Astigeevich
\>>> <Evgeny.Astigeevich@arm.com> wrote:
\>>>
\>>> Yes, the regression is stable. I double checked this. A full benchmark run
\>>> consists of at least 10 sub-runs to validate the score.
\>>>
\>>> I also checked if there were regressions of this benchmark across
\>>> different ARM hardware versions. I found all regressions of this benchmark
\>>> were in range 1.6%-2%.
\>>>
\>>>
\>>>
\>>> Kind regards,
\>>>
\>>> Evgeny Astigeevich
\>>>
\>>>
\>>>
\>>> From: Chandler Carruth \[mailto:chandlerc@gmail.com\]
\>>> Sent: 17 July 2015 07:52
\>>> To: Evgeny Astigeevich; Chandler Carruth
\>>> Cc: LLVM Developers Mailing List; Michael Zolotukhin
\>>>
\>>>
\>>> Subject: Re: \[LLVMdev\] GlobalsModRef (and thus LTO) is completely broken
\>>>
\>>>
\>>>
\>>> Hey, thanks for benchmarking.
\>>>
\>>>
\>>>
\>>> How stable is the 2% regression?
\>>>
\>>>
\>>>
\>>> Michael ran some benchmarks with GlobalsModRef completely disabled and the
\>>> only differences were in the noise. This was a complete spec2k6 run along
\>>> with some others. Based on the number of benchmarks run there, I'm going to
\>>> go ahead and submit these patches, but if you can clarify the impact here,
\>>> we can look at potentially some other tradeoff. I'm not particularly set on
\>>> one set of defaults, etc, I just don't want to keep patches held up based on
\>>> that. We can flip the default back and forth as new data arrives.
\>>>
\>>>
\>>>
\>>> On Thu, Jul 16, 2015 at 12:23 PM Evgeny Astigeevich
\>>> <evgeny.astigeevich@arm.com> wrote:
\>>>
\>>> Hi Chandler,
\>>>
\>>>
\>>>
\>>> I ran couple benchmarks with LTO turned on and your patches on ARM
\>>> hardware.
\>>>
\>>> There were no performance degradation of one benchmark and 2% slowdown of
\>>> another benchmark.
\>>>
\>>>
\>>>
\>>> Kind regards,
\>>>
\>>> Evgeny Astigeevich
\>>>
\>>>
\>>>
\>>>
\>>>
\>>> From: llvmdev-bounces@cs.uiuc.edu \[mailto:llvmdev-bounces@cs.uiuc.edu\] On
\>>> Behalf Of Evgeny Astigeevich
\>>> Sent: 15 July 2015 15:12
\>>>
\>>>
\>>> To: 'Chandler Carruth'; Gerolf Hoflehner
\>>> Cc: LLVM Developers Mailing List
\>>> Subject: Re: \[LLVMdev\] GlobalsModRef (and thus LTO) is completely broken
\>>>
\>>>
\>>>
\>>> Hi Chandler,
\>>>
\>>>
\>>>
\>>> I would like to run some benchmarks on ARM hardware and to look at impact
\>>> of your patches on LTO.
\>>>
\>>>
\>>>
\>>> Kind regards,
\>>>
\>>> Evgeny Astigeevich
\>>>
\>>>
\>>>
\>>> From: llvmdev-bounces@cs.uiuc.edu \[mailto:llvmdev-bounces@cs.uiuc.edu\] On
\>>> Behalf Of Chandler Carruth
\>>>
\>>>
\>>> Sent: 15 July 2015 10:45
\>>> To: Chandler Carruth; Gerolf Hoflehner
\>>> Cc: LLVM Developers Mailing List
\>>> Subject: Re: \[LLVMdev\] GlobalsModRef (and thus LTO) is completely broken
\>>>
\>>>
\>>>
\>>> I've fixed the obvious bugs I spotted in r242281\. These should be pure
\>>> correctness improvements.
\>>>
\>>>
\>>>
\>>> I've sent the two patches I'm imagining to address the core issue here:
\>>>
\>>> http://reviews.llvm.org/D11213
\>>>
\>>> http://reviews.llvm.org/D11214
\>>>
\>>>
\>>>
\>>> Currently, I have the unsafe alias results disabled by default, but with a
\>>> flag that can re-enable them if needed. I don't feel really strongly about
\>>> which way the default is set -- but that may be because I don't have lots of
\>>> users relying on LTO. I'll let others indicate which way they would be most
\>>> comfortable.
\>>>
\>>>
\>>>
\>>> Some IRC conversations indicated that early benchmark results with GMR
\>>> completely disabled weren't showing really significant swings, so maybe this
\>>> relatively small reduction in power of GMR won't be too problematic for
\>>> folks. Either way, I'm open to the different approaches. It's D11214 that I
\>>> care a lot about. =\]
\>>>
\>>>
\>>>
\>>>
\>>>
\>>> Thanks for all the thoughts here!
\>>>
\>>> -Chandler
\>>>
\>>>
\>>>
\>>> On Tue, Jul 14, 2015 at 11:25 PM Chandler Carruth <chandlerc@gmail.com>
\>>> wrote:
\>>>
\>>> Replying here, but several of the questions raised boil down to "couldn't
\>>> you make the usage of GetUnderlyingObject conservatively correct?". I'll try
\>>> and address that.
\>>>
\>>>
\>>>
\>>> I think this \*is\* the right approach, but I think it is very hard to do
\>>> without effectively disabling this part of GlobalsModRef. That is, the easy
\>>> ways are likely to fire very frequently IMO.
\>>>
\>>>
\>>>
\>>> The core idea is to detect a "no information" state coming out of
\>>> GetUnderlyingObject (likely be providing a custom version just for
\>>> GlobalsModRef and tailored to its use cases). This is particularly effective
\>>> at avoiding the problems with the recursion limit. But let's look at what
\>>> cases we \*wouldn't\* return that. Here are the cases I see when I thought
\>>> about this last night with Hal, roughly in descending likelihood I would
\>>> guess:
\>>>
\>>>
\>>>
\>>> 1) We detect some global or an alloca. In that case, even BasicAA would be
\>>> sufficient to provide no-alias. GMR shouldn't be relevant.
\>>>
\>>>
\>>>
\>>> 2) We detect a phi, select, or inttoptr, and stop there.
\>>>
\>>>
\>>>
\>>> 3) We detect a load and stop there.
\>>>
\>>>
\>>>
\>>> 4) We detect a return from a function.
\>>>
\>>>
\>>>
\>>> 5) We detect an argument to the function.
\>>>
\>>>
\>>>
\>>> I strongly suspect the vast majority of queries hit #1\. That's why BasicAA
\>>> is \*so\* effective. Both #4 and #5 I think are actually reasonable places for
\>>> GMR to potentially say "no-alias" and provide useful definitive information.
\>>> But I also suspect these are the least common.
\>>>
\>>>
\>>>
\>>> So let's look at #2 and #3 because I think they're interesting. For these,
\>>> I think it is extremely hard to return "no-alias". It seems extremely easy
\>>> for a reasonable and innocuous change to the IR to introduce a phi or a
\>>> select into one side of the GetUnderlyingObject but not the other. If that
\>>> ever happens, we can't return "no-alias" for #2, or we need to add really
\>>> expensive updates. It also seems reasonable (if not as likely) to want
\>>> adding a store and load to the IR to not trigger a miscompile. If it is
\>>> possible for a valid optimization pass to do reg2mem on an SSA value, then
\>>> that could happen to only one side of the paired GetUnderlyingObject and
\>>> break GMR with #3\. If that seems like an unreasonable thing to do, consider
\>>> loop re-rolling or other transformations which may need to take things in
\>>> SSA form at move them out of SSA form. Even if we then try immediately to
\>>> put it back \*into\* SSA form, before we do that we create a point where GMR
\>>> cannot correctly return no-alias.
\>>>
\>>>
\>>>
\>>> So ultimately, I don't think we want to rely on GMR returning "no-alias"
\>>> for either #2 or #3 because of the challenge of actually updating it in all
\>>> of the circumstances that could break them. That means that \*only\* #4 and #5
\>>> are going to return "no-alias" usefully. And even then, function inlining
\>>> and function outlining both break #4 and #5, so you have to preclude those
\>>> transforms while GMR is active. And I have serious doubts about these
\>>> providing enough value to be worth the cost.
\>>>
\>>>
\>>>
\>>>
\>>>
\>>> I think the better way to approach this is the other way around. Rather
\>>> than doing a backwards analysis to see if one location reaches and global
\>>> and the other location doesn't reach a global, I think it would be much more
\>>> effective to re-cast this as a forward analysis that determines all the
\>>> memory locations in a function that come from outside the function, and use
\>>> that to drive the no-alias responses.
\>>>
\>>>
\>>>
\>>>
\>>>
\>>> On Tue, Jul 14, 2015 at 12:12 PM Gerolf Hoflehner <ghoflehner@apple.com>
\>>> wrote:
\>>>
\>>> I wouldn’t be willing to give up performance for hypothetical issues.
\>>> Please protect all your changes with options. For some of your concerns it
\>>> is probably hard to provide a test case that shows an/the actual issue.
\>>>
\>>>
\>>>
\>>> I certainly agree that it will be very hard to provide a test case and
\>>> extremely rare to see this in the wild for most of these issues. As long as
\>>> I can remove the problematic update API we currently have (which as its an
\>>> API change can't really be put behind flags), I'm happy to have flags
\>>> control whether or not GMR uses the unsound / stale information to try to
\>>> answer alias queries. Do you have any opinion about what the default value
\>>> of the flags should be?
\>>>
\>>>
\>>>
\>>> I'll go ahead and prepare the patches, as it seems like we're all ending
\>>> up in the same position, and just wondering about the precise tradeoffs we
\>>> want to settle on.
\>>>
\>>>
\>>>
\>>> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\>>> LLVM Developers mailing list
\>>>
\>>> LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
\>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
\>>>
\>>>
\>>>
\>>> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\>>> LLVM Developers mailing list
\>>> LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
\>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
\>>>
\>>> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\>>> LLVM Developers mailing list
\>>> LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
\>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
\>>>
\>>>
\>>> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\>>> LLVM Developers mailing list
\>>> LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
\>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
\>>>
\>>> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\>>> LLVM Developers mailing list
\>>> LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
\>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
\>>
\>> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\>> LLVM Developers mailing list
\>> LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
\>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
\>>
\>>
\>>
\>> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\>> LLVM Developers mailing list
\>> LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
\>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
\>>

\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev