(original) (raw)

On Fri, Feb 27, 2015 at 2:13 PM Ahmed Bougacha <ahmed.bougacha@gmail.com> wrote:

On Fri, Feb 27, 2015 at 1:42 PM, Eric Christopher <echristo@gmail.com> wrote:
\>
\>
\> On Fri, Feb 27, 2015 at 1:38 PM Ahmed Bougacha <ahmed.bougacha@gmail.com>
\> wrote:
\>>
\>> On Thu, Feb 26, 2015 at 2:33 AM, Kristof Beyls <kristof.beyls@arm.com>
\>> wrote:
\>> >
\>> > Hi Ahmed,
\>> >
\>> > Did you run these experiments on a platform with a linker that makes
\>> > use of the AArch64CollectLOH-pass-produced information?
\>>
\>> As Jim says, I'm on iOS, so yes. However, I'm mostly running tests
\>> with the pass disabled.
\>>
\>> >
\>> > I'm guessing that the AArch64CollectLOH-pass information and a linker
\>> > that makes use of that information could affect the profitability of
\>> > the GlobalMerge pass?
\>>
\>> It could, and does, from what I've seen (beware anecdata):
\>> - reusing the adrp base prevents optimizing it (the various
\>> Adrp\*{ldr,str} LOHs).
\>> - reusing the adrp+add MergedGlobal pointer, with indexed addressing,
\>> doesn't prevent the AdrpAdd optimization.
\>>
\>> All in all, whether GlobalMerge is profitable or not (by increasing
\>> register pressure, or adding another indirection), whenever the LOH
\>> optimizations fire, they reduce its usefulness.
\>>
\>> AFAICT, the only case where LOHs help GlobalMerge is when the
\>> MergedGlobal base is closer to the adrp sequence than the actual
\>> global. Given that we only merge 4k of globals, on a 1MB range this
\>> doesn't happen very often.
\>>
\>>
\>>
\>> Which brings us to my fallback proposal: what about disabling the
\>> pass on darwin only? Various darwin-enabled features (e.g., LOHs)
\>> help mitigate the adrp problem, and global usage is usually frowned
\>> upon in those circles (except for singletons, class-/function-statics
\>> and whatnot, which I'm trying to address in an upcoming patch).
\>>
\>
\> Before making the disabling darwin only I'd like to see some analysis of the
\> regressions/improvements. Has anyone looked at the code for those yet?

Yep, I put a quick analysis in my other reply.

The LOH/ADRP bit?

\>
\>>
\>> As for other targets, as a first step, making the pass run under -O3
\>> rather than -O1 is hopefully agreeable to everyone? After all, it is
\>> "aggressive", and isn't always profitable. That's pretty much the
\>> description of -O3.
\>> We can still run into problematic cases under LTO, though.
\>>
\>
\> Seems reasonable to me, but probably want to see what happens with the above
\> questions first.

Fair enough. Bottom line is:
\- disabling it without LTO is a slight win on the test-suite, a solid
win everywhere else I've looked.
\- disabling it with LTO regresses quite a few SPEC benchmarks, and is
overall a slight regression on the test-suite.

Ah, I meant an analysis of the code, not just the numbers. I think the ADRP/LOH commentary really helps. It might only be a decent LTOish optimization, but I'm still curious how it's helping there over other optimizations.

Anyhow, FWIW I'm in favor of pulling it out of the non-LTO pipeline universally.

-eric

\-Ahmed

\> -eric
\>