[llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available (original) (raw)

Craig Topper via llvm-dev llvm-dev at lists.llvm.org
Thu Nov 2 21:47:29 PDT 2017

Previous message: [llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
Next message: [llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

That's a very good point about the ordering of the command line options. gcc's current implementation treats -mprefer-avx256 has "prefer 256 over 512" and -mprefer-avx128 as "prefer 128 over 256". Which feels weird for other reasons, but has less of an ordering ambiguity.

-mprefer-avx128 has been in gcc for many years and predates the creation of avx512. -mprefer-avx256 was added a couple months ago.

We've had an internal conversation with the implementor of -mprefer-avx256 in gcc about making -mprefer-avx128 affect 512-bit vectors as well. I'll bring up the ambiguity issue with them.

Do we want to be compatible with gcc here?

~Craig

On Thu, Nov 2, 2017 at 7:18 PM, Eric Christopher <echristo at gmail.com> wrote:

On Thu, Nov 2, 2017 at 7:05 PM James Y Knight via llvm-dev <_ _llvm-dev at lists.llvm.org> wrote: On Wed, Nov 1, 2017 at 7:35 PM, Craig Topper via llvm-dev <_ _llvm-dev at lists.llvm.org> wrote:

Hello all,

I would like to propose adding the -mprefer-avx256 and -mprefer-avx128 command line flags supported by latest GCC to clang. These flags will be used to limit the vector register size presented by TTI to the vectorizers. The backend will still be able to use wider registers for code written using the instrinsics in x86intrin.h. And the backend will still be able to use AVX512VL instructions and the additional XMM16-31 and YMM16-31 registers. Motivation: -Using 512-bit operations on some Intel CPUs may cause a decrease in CPU frequency that may offset the gains from using the wider register size. See section 15.26 of Intel® 64 and IA-32 Architectures Optimization Reference Manual published October 2017. I note the doc mentions that 256-bit AVX operations also have the same issue with reducing the CPU frequency, which is nice to see documented! There's also the issues discussed here <http://www.agner.org/_ _optimize/blog/read.php?i=165> (and elsewhere) related to warm-up time for the 256-bit execution pipeline, which is another issue with using wide-vector ops. -The vector ALUs on ports 0 and 1 of the Skylake Server microarchitecture are only 256-bits wide. 512-bit instructions using these ALUs must use both ports. See section 2.1 of Intel® 64 and IA-32 Architectures Optimization Reference Manual published October 2017. Implementation Plan: -Add prefer-avx256 and prefer-avx128 as SubtargetFeatures in X86.td not mapped to any CPU. -Add mprefer-avx256 and mprefer-avx128 and the corresponding -mno-prefer-avx128/256 options to clang's driver Options.td file. I believe this will allow clang to pass these straight through to the -target-feature attribute in IR. -Modify X86TTIImpl::getRegisterBitWidth to only return 512 if AVX512 is enabled and prefer-avx256 and prefer-avx128 is not set. Similarly return 256 if AVX is enabled and prefer-avx128 is not set. Instead of multiple flags that have difficult to understand intersecting behavior, one flag with a value would be better. E.g., what should "-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter the answer, it's confusing. (Similarly with other such combinations). Just a single arg "-mprefer-avx={128/256/512}" (with no "no" version) seems easier to understand to me (keeping the same behavior as you mention: asking to prefer a larger width than is supported by your architecture should be fine but ignored). I agree with this. It's a little more plumbing as far as subtarget features etc (represent via an optional value or just various "set the avx width" features - the latter being easier, but uglier), however, it's probably the right thing to do. I was looking at this myself just a couple weeks ago and think this is the right direction (when and how to turn things off) - and probably makes sense to be a default for these architectures? We might end up needing to check a couple of additional TTI places, but it sounds like you're on top of it. :) Thanks very much for doing this work. -eric There may be some other backend changes needed, but I plan to address those as we find them. At a later point, consider making -mprefer-avx256 the default for Skylake Server due to the above mentioned performance considerations.

Does this sound reasonable? *Latest Intel Optimization manual available here: https://software.intel.com/en-us/articles/intel-sdm#optimization -Craig Topper

LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171102/f8425e83/attachment-0001.html>

Previous message: [llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
Next message: [llvm-dev] RFC: [X86] Introducing command line options to prefer narrower vector instructions even when wider instructions are available
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the llvm-dev mailing list