(original) (raw)
I’ll summarize your responses as: The new pipeline produces better results than the old, and we currently have no good mechanism for reducing the compile time overhead.I’ll summarize my criticism as: In principle, there are better ways to clean up after the vectorizer without turning it into a complicated megapass, but no one has done the engineering. I don’t think cleaning up after the vectorizer should incur any noticeable overhead if the vectorizer never runs, and it would be avoidable with a sensibly designed passes that aren’t locked into the current pass manager design.
I don’t have the data right now to argue against enabling the new pipeline under O2\. Hopefully others who care about clang compile time will jump in.
As for the long-term plan to improve compile-time, all I can do now is to advocate for a better approach.
-Andy
On Oct 14, 2014, at 10:56 AM, Chandler Carruth <chandlerc@google.com> wrote:On Tue, Oct 14, 2014 at 10:11 AM, Andrew Trick <atrick@apple.com> wrote:>> + correlated-propagation
A little worried about this.
\>> + instcombine
I'm \*very\* concerned about rerunning instcombine, but understand it may help cleanup the vectorized preheader.Why are you concerned? Is instcombine that slow? I usually don't see huge overhead from re-running it on nearly-canonical code. (Oh, I see you just replied to Hal here, fair enough.
\>> + licm
\>> + loop-unswitch
These should limited to the relevant loop nest.We have no way to do that currently. Do you think they will in practice be too slow? If so, why? I would naively expect unswitch to be essentially free unless it can do something, and LICM not much more expensive.
\>> + simplifycfg
OK if the CFG actually changed.Again, we have no mechanism to gate this. Frustratingly, the only thing I want here is to delete dead code formed by earlier passes. We just don't have anything cheaper (and I don't have any measurements indicating we need something cheaper).
\>> + instcombine
instcombine again! This can’t be good.I actually have no specific reason to think we need this other than the fact that we run instcombine after simplifycfg in a bunch of other places. If you're looking for one to rip out, this would be the first one I would rip out because I'm doubtful of its value.On a separate note:
>> + early-cse
Passes like loop-vectorize should be able to do their own CSE without much engineering effort.
>> slp-vectorize
\>> + early-cse
SLP should do its own CSE.I actually agree with you in principle, but I would rather run the pass now (and avoid hacks downstream to essentially do CSE in the backend) than hold up progress on the hope of advanced on-demand CSE layers being added to the vectorizers. I don't know of anyone actually working on that, and so I'm somewhat concerned it will never materialize.