(original) (raw)

I’ll summarize your responses as: The new pipeline produces better results than the old, and we currently have no good mechanism for reducing the compile time overhead.

I’ll summarize my criticism as: In principle, there are better ways to clean up after the vectorizer without turning it into a complicated megapass, but no one has done the engineering. I don’t think cleaning up after the vectorizer should incur any noticeable overhead if the vectorizer never runs, and it would be avoidable with a sensibly designed passes that aren’t locked into the current pass manager design.

I don’t have the data right now to argue against enabling the new pipeline under O2\. Hopefully others who care about clang compile time will jump in.

As for the long-term plan to improve compile-time, all I can do now is to advocate for a better approach.

-Andy

On Oct 14, 2014, at 10:56 AM, Chandler Carruth <chandlerc@google.com> wrote:


On Tue, Oct 14, 2014 at 10:11 AM, Andrew Trick <atrick@apple.com> wrote:
>> + correlated-propagation

A little worried about this.

\>> + instcombine

I'm \*very\* concerned about rerunning instcombine, but understand it may help cleanup the vectorized preheader.

Why are you concerned? Is instcombine that slow? I usually don't see huge overhead from re-running it on nearly-canonical code. (Oh, I see you just replied to Hal here, fair enough.

\>> + licm
\>> + loop-unswitch

These should limited to the relevant loop nest.

We have no way to do that currently. Do you think they will in practice be too slow? If so, why? I would naively expect unswitch to be essentially free unless it can do something, and LICM not much more expensive.

\>> + simplifycfg

OK if the CFG actually changed.

Again, we have no mechanism to gate this. Frustratingly, the only thing I want here is to delete dead code formed by earlier passes. We just don't have anything cheaper (and I don't have any measurements indicating we need something cheaper).

\>> + instcombine

instcombine again! This can’t be good.

I actually have no specific reason to think we need this other than the fact that we run instcombine after simplifycfg in a bunch of other places. If you're looking for one to rip out, this would be the first one I would rip out because I'm doubtful of its value.

On a separate note:


>> + early-cse

Passes like loop-vectorize should be able to do their own CSE without much engineering effort.

>> slp-vectorize
\>> + early-cse

SLP should do its own CSE.

I actually agree with you in principle, but I would rather run the pass now (and avoid hacks downstream to essentially do CSE in the backend) than hold up progress on the hope of advanced on-demand CSE layers being added to the vectorizers. I don't know of anyone actually working on that, and so I'm somewhat concerned it will never materialize.