[llvm-dev] Vectorizer has trouble with vpmovmskb and store (original) (raw)
Johan Engelen via llvm-dev llvm-dev at lists.llvm.org
Sat Dec 1 04:38:34 PST 2018
- Previous message: [llvm-dev] Restrict global constructors to base ISA
- Next message: [llvm-dev] Vectorizer has trouble with vpmovmskb and store
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello Craig,
Thank you for the quick response and fix.
However, the improvement turns out to be quite fragile. If I run opt
on
the original testcase, and run the output through llc
then the previous
very long assembly output results. (things work for a bitcast from <16 x
i1> to i16, but not for a <16 x i1>* store)
Godbolt link: https://llvm.godbolt.org/z/j1ob9w
regards, Johan
On Tue, Nov 27, 2018 at 4:00 AM Craig Topper <craig.topper at gmail.com> wrote:
We should handle this a lot better after r34763
~Craig
On Mon, Nov 26, 2018 at 3:13 PM Craig Topper <craig.topper at gmail.com> wrote: Here's a quick patch that fixes this. I don't know to avoid it in IR. I haven't checked any other tests, but it does fix your case. I'll try to put up a real phabricator tonight or tomorrow.
diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp index e31f2a6..d79c0be 100644 --- a/lib/Target/X86/X86ISelLowering.cpp +++ b/lib/Target/X86/X86ISelLowering.cpp @@ -4837,6 +4837,11 @@ bool X86TargetLowering::isCheapToSpeculateCtlz() const { bool X86TargetLowering::isLoadBitCastBeneficial(EVT LoadVT, EVT BitcastVT) const { + if (!LoadVT.isVector() && BitcastVT.isVector() && + BitcastVT.getVectorElementType() == MVT::i1 && + !Subtarget.hasAVX512()) + return false; + if (!Subtarget.hasDQI() && BitcastVT == MVT::v8i1) return false;
~Craig On Mon, Nov 26, 2018 at 2:51 PM Johan Engelen via llvm-dev <_ _llvm-dev at lists.llvm.org> wrote: Hi all, I've run into a case where the optimizer seems to be having trouble doing the "obvious" thing. Consider this code:
_ _define i16 @foo(<8 x i16>* dereferenceable(16) %egress, <16 x i8> %a0) {_ _%a1 = icmp slt <16 x i8> %a0, zeroinitializer_ _%a2 = bitcast <16 x i1> %a1 to i16_ _%astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64_ _0, i64 7_ _;store i16 %a2, i16* %astore_ _ret i16 %a2_ _}_ _
The optimizer recognizes this and llc nicely outputs a vpmovmskb instruction:_ _foo: # @foo_ _vpmovmskb eax, xmm0_ _ret_ _
Writing to the output vector also works well:_ _define void @writing(<8 x i16>* dereferenceable(16) %egress, <16 x i8>_ _%a0) {_ _%astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64_ _0, i64 7_ _store i16 123, i16* %astore_ _ret void_ _}_ _
outputs:_ _writing: # @writing_ _mov word ptr [rdi + 14], 123_ _ret_ _
Now, combining these two by uncommenting the store infoo()
suddenly results in a very large function, instead of just: vpmovmskb eax, xmm0 mov word ptr [rdi + 14], ax ret Is there something wrong with my IR code, or is the optimizer somehow confused? Can I rewrite the code such that the optimizer does understand? Godbolt link: https://llvm.godbolt.org/z/OgExDk Thanks a lot for the help. Cheers, Johan
LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181201/5c29155c/attachment.html>
- Previous message: [llvm-dev] Restrict global constructors to base ISA
- Next message: [llvm-dev] Vectorizer has trouble with vpmovmskb and store
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]