(original) (raw)
Are you referring to the X86TargetLowering::isFsqrtCheap hook?
\~Craig
On Fri, Nov 10, 2017 at 7:39 AM, Sanjay Patel <spatel@rotateright.com> wrote:
We can tie a user preference / override to a CPU model. We do something like that for square root estimates already (although it does use a SubtargetFeature currently for x86; ideally, we'd key that off of something in the CPU scheduler model).On Thu, Nov 9, 2017 at 4:21 PM, Craig Topper <craig.topper@gmail.com> wrote:I agree that a less x86 specific command line makes sense. I've been having an internal discussions with gcc folks and their evaluating switching to something like -mprefer-vector-width=128/256/512/none Based on the current performance data we're seeing, we think we need to ultimately default skylake-avx512 to -mprefer-vector-width=256\. If we go with a target independent option/implementation is there someway we could still affect the default behavior in a target specific way?\~CraigOn Tue, Nov 7, 2017 at 9:06 AM, Sanjay Patel <spatel@rotateright.com> wrote:It's clear from the Intel docs how this has evolved, but from a compiler perspective, this isn't a Skylake "feature" :) ... nor an Intel feature, nor an x86 feature.
It's a generic programmer hint for any target with multiple potential vector lengths.On x86, there's already a potential use case for this hint with a different starting motivation: re-vectorization. That's where we take C code that uses 128-bit vector intrinsics and selectively widen it to 256- or 512-bit vector ops based on a newer CPU target than the code was originally written for.Note that having a target-independent implementation in the optimizer doesn't preclude a flag alias in clang to maintain compatibility with gcc.I think it's just a matter of time before a customer requests the same ability for another target (maybe they already have and I don't know about it). So we should have a solution that recognizes that possibility.On Tue, Nov 7, 2017 at 2:02 AM, Tobias Grosser via llvm-dev <llvm-dev@lists.llvm.org> wrote:On Fri, Nov 3, 2017, at 05:47, Craig Topper via llvm-dev wrote:
\> That's a very good point about the ordering of the command line options.
\> gcc's current implementation treats -mprefer-avx256 has "prefer 256 over
\> 512" and -mprefer-avx128 as "prefer 128 over 256". Which feels weird for
\> other reasons, but has less of an ordering ambiguity.
\>
\> -mprefer-avx128 has been in gcc for many years and predates the creation
\> of
\> avx512\. -mprefer-avx256 was added a couple months ago.
\>
\> We've had an internal conversation with the implementor of
\> -mprefer-avx256
\> in gcc about making -mprefer-avx128 affect 512-bit vectors as well. I'll
\> bring up the ambiguity issue with them.
\>
\> Do we want to be compatible with gcc here?
I certainly believe we would want to be compatible with gcc (if we use
the same names).
Best,
Tobias
\>
\> \~Craig
\>
\> On Thu, Nov 2, 2017 at 7:18 PM, Eric Christopher <echristo@gmail.com>
\> wrote:
\>
\> >
\> >
\> > On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev <
\> > llvm-dev@lists.llvm.org> wrote:
\> >
\> >> On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev <
\> >> llvm-dev@lists.llvm.org> wrote:
\> >>
\> >>> Hello all,
\> >>>
\> >>>
\> >>>
\> >>> I would like to propose adding the -mprefer-avx256 and -mprefer-avx128
\> >>> command line flags supported by latest GCC to clang. These flags will be
\> >>> used to limit the vector register size presented by TTI to the vectorizers.
\> >>> The backend will still be able to use wider registers for code written
\> >>> using the instrinsics in x86intrin.h. And the backend will still be able to
\> >>> use AVX512VL instructions and the additional XMM16-31 and YMM16-31
\> >>> registers.
\> >>>
\> >>>
\> >>>
\> >>> Motivation:
\> >>>
\> >>> -Using 512-bit operations on some Intel CPUs may cause a decrease in CPU
\> >>> frequency that may offset the gains from using the wider register size. See
\> >>> section 15.26 of IntelĀ® 64 and IA-32 Architectures Optimization Reference
\> >>> Manual published October 2017.
\> >>>
\> >>
\> >> I note the doc mentions that 256-bit AVX operations also have the same
\> >> issue with reducing the CPU frequency, which is nice to see documented!
\> >>
\> >> There's also the issues discussed here <http://www.agner.org/
\> >> optimize/blog/read.php?i=165> (and elsewhere) related to warm-up time
\> >> for the 256-bit execution pipeline, which is another issue with using
\> >> wide-vector ops.
\> >>
\> >>
\> >> -The vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture
\> >>> are only 256-bits wide. 512-bit instructions using these ALUs must use both
\> >>> ports. See section 2.1 of IntelĀ® 64 and IA-32 Architectures Optimization
\> >>> Reference Manual published October 2017.
\> >>>
\> >>
\> >>
\> >>> Implementation Plan:
\> >>>
\> >>> -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td not
\> >>> mapped to any CPU.
\> >>>
\> >>> -Add mprefer-avx256 and mprefer-avx128 and the corresponding
\> >>> -mno-prefer-avx128/256 options to clang's driver Options.td file. I believe
\> >>> this will allow clang to pass these straight through to the -target-feature
\> >>> attribute in IR.
\> >>>
\> >>> -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if AVX512 is
\> >>> enabled and prefer-avx256 and prefer-avx128 is not set. Similarly return
\> >>> 256 if AVX is enabled and prefer-avx128 is not set.
\> >>>
\> >>
\> >> Instead of multiple flags that have difficult to understand intersecting
\> >> behavior, one flag with a value would be better. E.g., what should
\> >> "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter the
\> >> answer, it's confusing. (Similarly with other such combinations). Just a
\> >> single arg "-mprefer-avx={128/256/512}" (with no "no" version) seems easier
\> >> to understand to me (keeping the same behavior as you mention: asking to
\> >> prefer a larger width than is supported by your architecture should be fine
\> >> but ignored).
\> >>
\> >>
\> > I agree with this. It's a little more plumbing as far as subtarget
\> > features etc (represent via an optional value or just various "set the avx
\> > width" features - the latter being easier, but uglier), however, it's
\> > probably the right thing to do.
\> >
\> > I was looking at this myself just a couple weeks ago and think this is the
\> > right direction (when and how to turn things off) - and probably makes
\> > sense to be a default for these architectures? We might end up needing to
\> > check a couple of additional TTI places, but it sounds like you're on top
\> > of it. :)
\> >
\> > Thanks very much for doing this work.
\> >
\> > -eric
\> >
\> >
\> >>
\> >>
\> >> There may be some other backend changes needed, but I plan to address
\> >>> those as we find them.
\> >>>
\> >>>
\> >>> At a later point, consider making -mprefer-avx256 the default for
\> >>> Skylake Server due to the above mentioned performance considerations.
\> >>>
\> >>
\> >>
\> >>
\> >>
\> >>
\> >>>
\> >> Does this sound reasonable?
\> >>>
\> >>>
\> >>>
\> >>> \*Latest Intel Optimization manual available here:
\> >>> https://software.intel.com/en-us/articles/intel-sdm#optimiza tion
\> >>>
\> >>>
\> >>> -Craig Topper
\> >>>
\> >>> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\> >>> LLVM Developers mailing list
\> >>> llvm-dev@lists.llvm.org
\> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
\> >>>
\> >>> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\> >> LLVM Developers mailing list
\> >> llvm-dev@lists.llvm.org
\> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
\> >>
\> >
\> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\> LLVM Developers mailing list
\> llvm-dev@lists.llvm.org
\> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
LLVM Developers mailing list
llvm-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev