[llvm-dev] Question about 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT' (original) (raw)

Jon Chesterfield via llvm-dev llvm-dev at lists.llvm.org
Fri Sep 15 12:16:44 PDT 2017

Previous message: [llvm-dev] Question about 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT'
Next message: [llvm-dev] Question about 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT'
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi JinGu,

The initial selection dag looks reasonable to me. Are you seeing a cannot select error related to the extending load or does the assembly generated fail to implement the semantics you expect?

Jon

On Fri, Sep 15, 2017 at 8:00 PM, via llvm-dev <llvm-dev at lists.llvm.org> wrote:

Send llvm-dev mailing list submissions to llvm-dev at lists.llvm.org

To subscribe or unsubscribe via the World Wide Web, visit http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev or, via email, send a message with subject or body 'help' to llvm-dev-request at lists.llvm.org You can reach the person managing the list at llvm-dev-owner at lists.llvm.org When replying, please edit your Subject line so it is more specific than "Re: Contents of llvm-dev digest..."

Today's Topics: 1. What should a truncating store do? (Jon Chesterfield via llvm-dev) 2. Re: Question about 'DAGTypeLegalizer::SplitVecOpEXTRACTVECTORELT' (jingu at codeplay.com via llvm-dev) 3. DIVA - Debug Information Visual Analyser (Phil Camp via llvm-dev) 4. Re: Changes to 'ADJCALLSTACK*' and 'callseq*' between LLVM v4.0 and v5.0 (Serge Pavlov via llvm-dev) 5. Re: RFC: Trace-based layout. (Kyle Butt via llvm-dev) 6. Re: Question about 'DAGTypeLegalizer::SplitVecOpEXTRACTVECTORELT' (Demikhovsky, Elena via llvm-dev) 7. Re: What should a truncating store do? (Friedman, Eli via llvm-dev) 8. Re: What should a truncating store do? (Jon Chesterfield via llvm-dev) 9. Re: What should a truncating store do? (Friedman, Eli via llvm-dev) ---------------------------------------------------------------------- Message: 1 Date: Fri, 15 Sep 2017 13:49:48 +0100 From: Jon Chesterfield via llvm-dev <llvm-dev at lists.llvm.org> To: llvm-dev <llvm-dev at lists.llvm.org> Subject: [llvm-dev] What should a truncating store do? Message-ID: <CAOUYtQCN4KYLtmwmVjnCajsSfVKwSETAPZ1zaoYK9w=v3c26Tg at mail._ _gmail.com> Content-Type: text/plain; charset="utf-8" For example, truncating store of an i32 to i6. My assumption was that this should write the low six bits of the i32 to somewhere in memory. Should the top 24 bits of a corresponding 32 bit region of memory be unchanged, zero, undefined? Should the two bits that would round the i6 up to a byte be preserved, zero, undefined? I can't write six bits directly so am trying to determine what set of bitwise ops to apply between a load and subsequent store to emulate the truncating store. Thanks! Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/_ _attachments/20170915/5b458bec/attachment-0001.html> ------------------------------ Message: 2 Date: Fri, 15 Sep 2017 15:45:05 +0100 From: "jingu at codeplay.com via llvm-dev" <llvm-dev at lists.llvm.org> To: "llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org>, elena.demikhovsky at intel.com, daniellsanders at apple.com Subject: Re: [llvm-dev] Question about 'DAGTypeLegalizer::SplitVecOpEXTRACTVECTORELT' Message-ID: <5fdb722e-2682-ee03-871b-0f00ed1b5909 at codeplay.com> Content-Type: text/plain; charset=utf-8; format=flowed Can someone give the comment about it please? Thanks, JinGu Kang On 14/09/17 12:05, jingu at codeplay.com wrote: > Hi All, > > I have a question about splitting 'EXTRACTVECTORELT' with 'v2i1'. I > have a llvm IR code snippet as following: > > llvm IR code snippet: > > for.body: ; preds = %entry, > %for.cond > %i.022 = phi i32 [ 0, %entry ], [ %inc, %for.cond ] > %0 = icmp ne <2 x i32> %vecinit1, <i32 0, i32 -23> > %1 = extractelement <2 x i1> %0, i32 %i.022 > %vecext4 = extractelement <2 x i32> %vecinit1, i32 %i.022 > %vecext5 = extractelement <2 x i32> <i32 0, i32 -23>, i32 %i.022 > %cmp6 = icmp ne i32 %vecext4, %vecext5 > %cmp7 = xor i1 %1, %cmp6 > > ... > > and the SelectionDAG before TypeLegalizer is like this. > > t0: ch = EntryToken > t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0 > t3: ch = ValueType:i32 > t5: i32,ch = CopyFromReg t2:1, Register:i32 %vreg1 > t7: i32 = AssertZext t5, ValueType:ch:i1 > t8: v2i32 = BUILDVECTOR t2, t7 > t11: v2i32 = BUILDVECTOR Constant:i32<0>, Constant:i32<-23> > t15: i32,ch = CopyFromReg t0, Register:i32 %vreg2 > t22: i32 = add t15, Constant:i32<1> > t24: ch = CopyToReg t0, Register:i32 %vreg3, t22 > t27: ch = CopyToReg t0, Register:i32 %vreg8, Constant:i32<-1> > t31: ch = TokenFactor t24, t27 > t13: v2i1 = setcc t8, t11, setne:ch > t16: i1 = extractvectorelt t13, t15 > t17: i32 = extractvectorelt t8, t15 > t18: i32 = extractvectorelt t11, t15 > t19: i1 = setcc t17, t18, setne:ch > t20: i1 = xor t16, t19 > > ... > > I have not added any vector register class so 'DAGTypeLegalizer' tries > to split the "t16: i1 = extractvectorelt t13, t15" because t13's > result type is 'v2i1'. If the size of vector element is less than > 8bit, 'DAGTypeLegalizer::SplitVecOpEXTRACTVECTORELT()' function > extends the elements to 8bit and stores them on stack. Finally, the > function generates 'ExtLoad' to load specific element. But if the > element's size is less than 8bit, I think it could be wrong. It looks > it needs just 'Load' or "Load and Truncate" to match the result type > of 'EXTRACTVECTORELT'. How do you think about it? If I missed > something, please let me know. > > Thanks, > > JinGu Kang >

------------------------------ Message: 3 Date: Fri, 15 Sep 2017 16:38:48 +0100 From: Phil Camp via llvm-dev <llvm-dev at lists.llvm.org> To: llvm-dev at lists.llvm.org Subject: [llvm-dev] DIVA - Debug Information Visual Analyser Message-ID: <5b25cc76-bbd9-515c-b984-34a03dd1cd2a at flametop.co.uk> Content-Type: text/plain; charset="utf-8"; Format="flowed" DIVA, the Debug Information Visual Analyser, was presented at the 2017 European LLVM Developers Meeting (https://www.youtube.com/watch?v=SwtpXaCk2bE). The DIVA binaries have been available since March, I am pleased to announce that the source code is now available on GitHub. https://github.com/SNSystems/DIVA DIVA is a command line tool that processes DWARF debug information contained within ELF files and prints the semantics of that debug information. The DIVA output is designed to be understandable by software programmers without any low-level compiler or DWARF knowledge; as such, it can be used to report debug information bugs to the compiler provider. DIVA's output can also be used as the input to DWARF tests, to compare the debug information generated from multiple compilers, from different versions of the same compiler, from different compiler switches and from the use of different DWARF specifications (i.e. DWARF 3, 4 and 5). DIVA will be used on the LLVM project to test and validate the output of clang to help improve the quality of the debug experience. Phil Camp SN Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/_ _attachments/20170915/c02ff23f/attachment-0001.html> ------------------------------ Message: 4 Date: Fri, 15 Sep 2017 23:39:44 +0700 From: Serge Pavlov via llvm-dev <llvm-dev at lists.llvm.org> To: "Martin J. O'Riordan" <MartinO at theheart.ie> Cc: LLVM Developers <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Changes to 'ADJCALLSTACK*' and 'callseq*' between LLVM v4.0 and v5.0 Message-ID: <CACOhrX4VSKtYBubv9q5kFd=btSWe5k6eEQSOYEo8c4uB2O27Rw at mail._ _gmail.com> Content-Type: text/plain; charset="utf-8" Hi Martin, Pseudo CALLSEQSTART was changed in r302527, commit message contains details on the changes. However CALLSEQEND was not modified. If your made changes to ADJCALLSTACKUP to add additional argument, that may result in error. Thanks, --Serge 2017-09-15 19:09 GMT+07:00 Martin J. O'Riordan via llvm-dev <_ _llvm-dev at lists.llvm.org>: > Hi LLVM-Devs, > > I have managed to complete updating our sources from LLVM v4.0 to v5.0, but > I am getting selection errors for 'callseqend'. I am aware that the > 'ADJCALLSTACKUP' and 'ADJCALLSTACKDOWN' patterns have changed, and have > added an additional argument to the TD descriptions for these. > > There are interactions with 'ISD::CALL' and 'ISD::RETFLAG', but so far as > I > can tell I have revised these in the same way as the in-tree targets have > adjusted their sources. > > The error I am seeing is: > > fatal error: error in backend: Cannot select: 0x15c9bbe00: ch,glue = > callseqend 0x15c9bbd98, TargetConstant:i32<0>, _> TargetGlobalAddress:i32 (i8*, i32, i8*, i8*)* @assertfunc> 0, 0x15c9bbd98:1 > 0x15c9bb920: i32 = TargetConstant<0> _> 0x15c9bb8b8: i32 = TargetGlobalAddress<void (i8*, i32, i8*, i8*)*_ _> @assertfunc> 0 > 0x15c9bbd98: ch,glue = MYISD::CALL 0x15c9bbcc8, > TargetGlobalAddress:i32<void (i8*, i32, i8*, i8*)* @_assertfunc> 0, > Register:i32 %I18, Register:i32 %I17, Register:i32 %I16, Register:i32 %I15, > RegisterMask:Untyped, 0x15c9bbcc8:1 _> 0x15c9bb8b8: i32 = TargetGlobalAddress<void (i8*, i32, i8*, i8*)*_ _> @assertfunc> 0 > 0x15c9bb9f0: i32 = Register %I18 > 0x15c9bbac0: i32 = Register %I17 > 0x15c9bbb90: i32 = Register %I16 > 0x15c9bbc60: i32 = Register %I15 > 0x15c9bbd30: Untyped = RegisterMask > 0x15c9bbcc8: ch,glue = CopyToReg 0x15c9bbbf8, Register:i32 %I15, > 0x15c9bb718, 0x15c9bbbf8:1 > 0x15c9bbc60: i32 = Register %I15 > 0x15c9bb718: i32,ch,glue = CopyFromReg 0x15c9bb648:1, Register:i32 > %vreg2, 0x15c9bb648:1 > 0x15c9bb6b0: i32 = Register %vreg2 > 0x15c9bbbf8: ch,glue = CopyToReg 0x15c9bbb28, Register:i32 %I16, > Constant:i32<0>, 0x15c9bbb28:1 > 0x15c9bbb90: i32 = Register %I16 > 0x15c9bb850: i32 = Constant<0> > 0x15c9bbb28: ch,glue = CopyToReg 0x15c9bba58, Register:i32 %I17, > 0x15c9bb648, 0x15c9bba58:1 > 0x15c9bbac0: i32 = Register %I17 > 0x15c9bb648: i32,ch,glue = CopyFromReg 0x15c9bb578:1, > Register:i32 %vreg1, 0x15c9bb578:1 > 0x15c9bb5e0: i32 = Register %vreg1 > 0x15c9bba58: ch,glue = CopyToReg 0x15c9bb988, Register:i32 > %I18, > 0x15c9bb578 > 0x15c9bb9f0: i32 = Register %I18 > 0x15c9bb578: i32,ch,glue = CopyFromReg 0x15c967b38, > Register:i32 %vreg0 > 0x15c9bb510: i32 = Register %vreg0 > > My TD for this has: > > def SDTMYCallSeqStart : SDCallSeqStart<[SDTCisVT<0, i32>, SDTCisVT<1,_ _> i32>]>; > def SDTMYCallSeqEnd : SDCallSeqStart<[SDTCisVT<0, i32>, SDTCisVT<1,_ _> i32>]>; > def MYCallseqStart : SDNode<"ISD::CALLSEQSTART",_ _SDTMYCallSeqStart,_ _> [SDNPHasChain, SDNPOutGlue]>; > def MYCallseqEnd : SDNode<"ISD::CALLSEQEND", SDTMYCallSeqEnd,_ _> [SDNPHasChain, SDNPOptInGlue, > SDNPOutGlue]>; > > def SDTMYCall : SDTypeProfile<0, 1, [SDTCisVT<0, i32>]>; > def SDTMYRet : SDTypeProfile<0, 0, []>; > def MYcall : SDNode<"MYISD::CALL", SDTMYCall,_ _> [SDNPHasChain, SDNPOptInGlue, > SDNPOutGlue, > SDNPVariadic]>; > def MYret : SDNode<"MYISD::RETFLAG", SDTNone,_ _> [SDNPHasChain, SDNPOptInGlue, > SDNPVariadic]>; > > let hasCtrlDep = 1, hasSideEffects = 1 in { > def ADJCALLSTACKDOWN : Pseudo<(outs), (ins i32imm:$amt1,_ _i32imm:$amt2),_ _> [(MYCallseqStart timm:$amt1, > timm:$amt2)]>; > def ADJCALLSTACKUP : Pseudo<(outs), (ins i32imm:$amt1,_ _i32imm:$amt2),_ _> [(MYCallseqEnd timm:$amt1, timm:$amt2)]>; > } > > def: Pat<(MYret), (JMPRet (i32 LR))>; > > The function that is failing does warn - "warning: function declared > 'noreturn' should not return [-Winvalid-noreturn]", and it does seem to > return. In fact it invokes a custom builtin which does not actually > return. > In the past I have just ignored this warning. > > Any hints that might help me to make the necessary adaptations to fix this? > > Thanks in advance, > > MartinO > > PS: I won't be able to reply until Monday as I will be away for the weekend > > _> ________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/_ _attachments/20170915/88bef271/attachment-0001.html> ------------------------------ Message: 5 Date: Fri, 15 Sep 2017 10:00:11 -0700 From: Kyle Butt via llvm-dev <llvm-dev at lists.llvm.org> To: Sean Silva <chisophugis at gmail.com> Cc: LLVM Developers <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] RFC: Trace-based layout. Message-ID: <CABeP02Ar0toCzHnax2EdGyGu8Bukq6PGEeoTy0CmSi0Dg8yneQ at mail._ _gmail.com> Content-Type: text/plain; charset="utf-8" It is essentially block layout algorithm 2 here, with limited non-greedy lookahead. (The triangle detection) https://www.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=p16-pettis.pdf On Thu, Sep 14, 2017 at 7:24 PM, Sean Silva <chisophugis at gmail.com> wrote: > Is this an existing published algorithm? Do you have a link to a paper? > > -- Sean Silva > > On Thu, Sep 14, 2017 at 6:53 PM, Kyle Butt via llvm-dev <_ _> llvm-dev at lists.llvm.org> wrote: > >> I plan on rewriting the block placement algorithm to proceed by traces. >> >> A trace is a chain of blocks where each block in the chain may fall >> through to >> the successor in the chain. >> >> The overall algorithm would be to first produce traces for a function, >> and then >> order those traces to try and get cache locality. >> >> Currently block placement uses a greedy single step approach to layout. It >> produces chains working from inner to outer loops. Unlike a trace, a >> chain may >> contain non-fallthrough edges. This causes problems with loop layout. The >> main >> problems with loop layout are: loop rotation and cold blocks in a loop. >> >> Overview of proposed solution: >> >> Phase 1: >> Greedily produce a set of traces through the function. A trace is a list >> of >> blocks with each block in the list falling through (possibly >> conditionally) to >> the next block in the list. Loop rotation will occur naturally in this >> phase via >> the triangle replacement algorithm below. Handling single trace loops >> requires a >> tweak, see the detailed design. >> >> Phase 2: >> After producing what we believe are the best traces, they need to be >> ordered. >> They will be ordered topologically, except that traces that are cold >> enough (As >> measured by their warmest block) will be floated later, This may push >> them out >> of a loop or to the end of the function. >> >> Detailed Design >> >> Note whenever an edge is used as a number, I am referring to the edge >> frequency. >> >> Phase 1: Producing traces >> Traces are produced according to the following algorithm: >> * Sort the edges according to weight, stable-sorting them according the >> incoming >> block and edge ordering. >> * Place each block in a trace of length 1. >> * For each edge in order: >> * If the source is at the end of a trace, and the target is at the >> beginning >> of a trace, glue those 2 traces into 1 longer trace. >> * If an edge has a target or source in the middle of another trace, >> consider >> tail duplication. The benefit calculation is the same as the >> existing >> code. >> * If an edge has a source or target in the middle, check them to see >> if they >> can be replaced as a triangle. (Triangle replacement described >> below) >> * Compare the benefit of choosing the edge, along with any triangles >> found, with the cost of breaking the existing edges. >> * If it is a net benefit, perform the switch. >> * Triangle checking: >> Consider a trace in 2 parts: A1->A2, and the current edge under >> consideration >> is A1->B (the case for C->A2 is mirror, and both may need to be done) >> * First find the best alternative C->B >> * Check for an alternative for A2: D->A2 >> * Find D's best Alternative: D->E >> * Compare the frequencies: A1->A2 + C->B + D->E vs A1->B + D->A2 >> * If the 2nd sum is bigger, do the switch. >> * Loop Rotation Tweak: >> If A contains a backedge A2->A1, then when considering A1->B or >> C->A2, we >> can include that backedge in the gain: >> A1->A2 + C->D + E->B vs A1->B + C->A2 + A2->A >> >> Phase 2: Order traces. >> First we compute the frequency of a trace by finding the max frequency of >> any of >> its blocks. >> Then we attempt to place the traces topologically. When a trace cannot be >> placed >> topologically, we prefer warmer traces first. >> >> Questions and comments welcome. >> _>> ________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/_ _attachments/20170915/82dfc991/attachment-0001.html> ------------------------------ Message: 6 Date: Fri, 15 Sep 2017 17:42:23 +0000 From: "Demikhovsky, Elena via llvm-dev" <llvm-dev at lists.llvm.org> To: "jingu at codeplay.com" <jingu at codeplay.com>, "daniellsanders at apple.com" <daniellsanders at apple.com> Cc: "llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Question about 'DAGTypeLegalizer::SplitVecOpEXTRACTVECTORELT' Message-ID: <A0DC88CEB3010344830D52D66533DA8E5EE2F88D at hasmsx108.ger._ _corp.intel.com> Content-Type: text/plain; charset="utf-8" > extends the elements to 8bit and stores them on stack. Store is responsible for zero-extend. This is the policy... - Elena -----Original Message----- From: jingu at codeplay.com [mailto:jingu at codeplay.com] Sent: Friday, September 15, 2017 17:45 To: llvm-dev at lists.llvm.org; Demikhovsky, Elena <_ _elena.demikhovsky at intel.com>; daniellsanders at apple.com Subject: Re: Question about 'DAGTypeLegalizer::SplitVecOp EXTRACTVECTORELT' Can someone give the comment about it please? Thanks, JinGu Kang On 14/09/17 12:05, jingu at codeplay.com wrote: > Hi All, > > I have a question about splitting 'EXTRACTVECTORELT' with 'v2i1'. I > have a llvm IR code snippet as following: > > llvm IR code snippet: > > for.body: ; preds = %entry, > %for.cond > %i.022 = phi i32 [ 0, %entry ], [ %inc, %for.cond ] > %0 = icmp ne <2 x i32> %vecinit1, <i32 0, i32 -23> > %1 = extractelement <2 x i1> %0, i32 %i.022 > %vecext4 = extractelement <2 x i32> %vecinit1, i32 %i.022 > %vecext5 = extractelement <2 x i32> <i32 0, i32 -23>, i32 %i.022 > %cmp6 = icmp ne i32 %vecext4, %vecext5 > %cmp7 = xor i1 %1, %cmp6 > > ... > > and the SelectionDAG before TypeLegalizer is like this. > > t0: ch = EntryToken > t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0 > t3: ch = ValueType:i32 > t5: i32,ch = CopyFromReg t2:1, Register:i32 %vreg1 > t7: i32 = AssertZext t5, ValueType:ch:i1 > t8: v2i32 = BUILDVECTOR t2, t7 > t11: v2i32 = BUILDVECTOR Constant:i32<0>, Constant:i32<-23> > t15: i32,ch = CopyFromReg t0, Register:i32 %vreg2 > t22: i32 = add t15, Constant:i32<1> > t24: ch = CopyToReg t0, Register:i32 %vreg3, t22 > t27: ch = CopyToReg t0, Register:i32 %vreg8, Constant:i32<-1> > t31: ch = TokenFactor t24, t27 > t13: v2i1 = setcc t8, t11, setne:ch > t16: i1 = extractvectorelt t13, t15 > t17: i32 = extractvectorelt t8, t15 > t18: i32 = extractvectorelt t11, t15 > t19: i1 = setcc t17, t18, setne:ch > t20: i1 = xor t16, t19 > > ... > > I have not added any vector register class so 'DAGTypeLegalizer' tries > to split the "t16: i1 = extractvectorelt t13, t15" because t13's > result type is 'v2i1'. If the size of vector element is less than > 8bit, 'DAGTypeLegalizer::SplitVecOpEXTRACTVECTORELT()' function > extends the elements to 8bit and stores them on stack. Finally, the > function generates 'ExtLoad' to load specific element. But if the > element's size is less than 8bit, I think it could be wrong. It looks > it needs just 'Load' or "Load and Truncate" to match the result type > of 'EXTRACTVECTORELT'. How do you think about it? If I missed > something, please let me know. > > Thanks, > > JinGu Kang > --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ------------------------------ Message: 7 Date: Fri, 15 Sep 2017 10:55:14 -0700 From: "Friedman, Eli via llvm-dev" <llvm-dev at lists.llvm.org> To: Jon Chesterfield <jonathanchesterfield at gmail.com>, llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] What should a truncating store do? Message-ID: <a0b1d63b-d177-beff-899e-420e8f2c0798 at codeaurora.org> Content-Type: text/plain; charset=utf-8; format=flowed On 9/15/2017 5:49 AM, Jon Chesterfield via llvm-dev wrote: > For example, truncating store of an i32 to i6. My assumption was that > this should write the low six bits of the i32 to somewhere in memory. > > Should the top 24 bits of a corresponding 32 bit region of memory be > unchanged, zero, undefined? Unchanged. > Should the two bits that would round the i6 up to a byte be preserved, > zero, undefined? Zero. Legalization will normally handle this for you, though, by transforming it to an i8 store. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ------------------------------ Message: 8 Date: Fri, 15 Sep 2017 19:30:20 +0100 From: Jon Chesterfield via llvm-dev <llvm-dev at lists.llvm.org> To: "Friedman, Eli" <efriedma at codeaurora.org> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] What should a truncating store do? Message-ID: <CAOUYtQBoArROmMx1Ke0jFxpsQ2ztFqtNxgbLzWVvycs0Ls72eA at mail._ _gmail.com> Content-Type: text/plain; charset="utf-8" Interesting, thank you. I expected both answers to be "unchanged" so was surprised by the zero extend in the legaliser. The motivation here is that it's faster for us to load N bytes, apply whatever masks are necessary to reproduce the truncating store then store all N bytes. This is only a good plan if there's no change to the semantics :) Are scalar integer types zero extended to the next multiple of 8 or to the next power of 2 greater than 7? For example, i17 => i24 or i17 => i32? I think this means truncating stores of vector types will introduce zero bits at the end of each element instead grouping all the zeros at the end. For example, <i6 63, i6 63> writes to sixteen bits as 0b0011111100111111, not as 0b0000111111111111? Thanks! Jon On Fri, Sep 15, 2017 at 6:55 PM, Friedman, Eli <efriedma at codeaurora.org> wrote: > On 9/15/2017 5:49 AM, Jon Chesterfield via llvm-dev wrote: > >> For example, truncating store of an i32 to i6. My assumption was that >> this should write the low six bits of the i32 to somewhere in memory. >> >> Should the top 24 bits of a corresponding 32 bit region of memory be >> unchanged, zero, undefined? >> > > Unchanged. > > Should the two bits that would round the i6 up to a byte be preserved, >> zero, undefined? >> > > Zero. Legalization will normally handle this for you, though, by > transforming it to an i8 store. > > -Eli > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux > Foundation Collaborative Project > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/_ _attachments/20170915/1b054776/attachment-0001.html> ------------------------------ Message: 9 Date: Fri, 15 Sep 2017 11:41:14 -0700 From: "Friedman, Eli via llvm-dev" <llvm-dev at lists.llvm.org> To: Jon Chesterfield <jonathanchesterfield at gmail.com> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] What should a truncating store do? Message-ID: <8a9c81d9-9c89-9956-c269-d3057a71b451 at codeaurora.org> Content-Type: text/plain; charset=utf-8; format=flowed On 9/15/2017 11:30 AM, Jon Chesterfield wrote: > Interesting, thank you. I expected both answers to be "unchanged" so > was surprised by the zero extend in the legaliser. > > The motivation here is that it's faster for us to load N bytes, apply > whatever masks are necessary to reproduce the truncating store then > store all N bytes. This is only a good plan if there's no change to > the semantics :) See http://llvm.org/docs/LangRef.html#store-instruction . In general, you have to be careful to avoid data races, but that might not apply to your target. > Are scalar integer types zero extended to the next multiple of 8 or to > the next power of 2 greater than 7? For example, i17 => i24 or i17 => i32? Multiple of 8. > I think this means truncating stores of vector types will introduce > zero bits at the end of each element instead grouping all the zeros at > the end. For example, <i6 63, i6 63> writes to sixteen bits as > 0b0011111100111111, not as 0b0000111111111111? Vector types are tightly packed, so <8 x i1> is 1 byte, not 8 bytes. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ------------------------------ Subject: Digest Footer

llvm-dev mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

------------------------------ End of llvm-dev Digest, Vol 159, Issue 57 ***************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170915/e80f49d2/attachment.html>

Previous message: [llvm-dev] Question about 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT'
Next message: [llvm-dev] Question about 'DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT'
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the llvm-dev mailing list