[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge? (original) (raw)
Ahmed Bougacha ahmed.bougacha at gmail.com
Fri Feb 27 15:45:40 PST 2015
- Previous message: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- Next message: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Feb 27, 2015 at 2:21 PM, Eric Christopher <echristo at gmail.com> wrote:
> Before making the disabling darwin only I'd like to see some analysis of > the > regressions/improvements. Has anyone looked at the code for those yet?
Yep, I put a quick analysis in my other reply. The LOH/ADRP bit?
> >> >> As for other targets, as a first step, making the pass run under -O3 >> rather than -O1 is hopefully agreeable to everyone? After all, it is >> "aggressive", and isn't always profitable. That's pretty much the >> description of -O3. >> We can still run into problematic cases under LTO, though. >> > > Seems reasonable to me, but probably want to see what happens with the > above > questions first. Fair enough. Bottom line is: - disabling it without LTO is a slight win on the test-suite, a solid win everywhere else I've looked. - disabling it with LTO regresses quite a few SPEC benchmarks, and is overall a slight regression on the test-suite. Ah, I meant an analysis of the code, not just the numbers. I think the ADRP/LOH commentary really helps. It might only be a decent LTOish optimization, but I'm still curious how it's helping there over other optimizations.
Basically - and I think this is what Renato asks as well - it doesn't really interact with later optimizations. Throughout most of the backend, we keep global references (e.g., adrp+add) together, as a pseudo instruction (MOVaddr, LOADgot, ...). Very late we expand it to adrp+add/.... So, the only thing that helps is the LOH linker optimizations, which try to simplify some of the adrp sequences. Really, the backend is oblivious to the fact that global references aren't trivial. We don't try to CSE the adrp's, for instance (I believe there was a patch for that, Quentin and Jiangning might know more). Does that clarify a bit?
Looking at the code, you have two main problematic situations:
- the register pressure tradeoff:
Consider:
adrp x8, 133 ldr x8, [x8, #3568] ... adrp x8, 133 ldr x0, [x8, #3576]
Turning into:
adrp x19, 133 add x19, x19, #3392 ldr x8, [x19, #192] ... ldr x0, [x19, #200]
- an additional instruction when only one global from a merged set is accessed (or when the LOH optimizations fired)
Consider the similar:
adrp x20, 133 ldr x8, [x20, #3432] ... str x0, [x20, #3432]
Turning into:
adrp x20, 133 add x20, x20, #3392 ldr x8, [x20, #56] ... str x0, [x20, #56]
One positive case is explained in the GlobalMerge.cpp comments: it reduces register pressure in a loop, by using a single base register for multiple globals.
Another positive is that merging globals effectively CSEs the base address computation.
Anyhow, FWIW I'm in favor of pulling it out of the non-LTO pipeline universally.
I tend to agree, but it's still sometimes useful in non-LTO. One case that came up in benchmarks was a bunch of file-static globals used pervasively in a single file (I believe lex/yacc can generate this kind of thing). There it's very beneficial, even without LTO. Hence, -O3 and -mno-global-merge, if necessary.
-Ahmed
-eric
-Ahmed > -eric >
- Previous message: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- Next message: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]