[llvm-dev] Lowering llvm.memset for ARM target (original) (raw)

Evgeny Astigeevich via llvm-dev llvm-dev at lists.llvm.org
Fri Sep 8 08:22:08 PDT 2017


Hi Bharathi,

From the discussion you provided I found that the issue happens for a big-endian ARM target. For the little-endian target the intrinsic in your test case is lowered to store instructions. Some debugging is needed to figure out why it's not happening for big-endian.

-Evgeny

-----Original Message----- From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Evgeny Astigeevich via llvm-dev Sent: Thursday, September 07, 2017 4:25 PM To: Bharathi Seshadri Cc: llvm-dev; nd Subject: Re: [llvm-dev] Lowering llvm.memset for ARM target

Hi Bharathi, MaxStoresPerMemset was changed from 16 to 8 in r 169791. The commit comment: "Some enhancements for memcpy / memset inline expansion. 1. Teach it to use overlapping unaligned load / store to copy / set the trailing bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies. 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g. x86 and ARM. 3. When memcpy from a constant string, do not replace the load with a constant if it's not possible to materialize an integer immediate with a single instruction (required a new target hook: TLI.isIntImmLegal()). 4. Use unaligned load / stores more aggressively if target hooks indicates they are "fast". 5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8. Also increase the threshold to something reasonable (8 for memset, 4 pairs for memcpy). This significantly improves Dhrystone, up to 50% on ARM iOS devices. rdar://12760078" It's strange. According to the comment the threshold was increased but it is decreased. I think the code needs to be revisited and benchmarked. I'll do some benchmarking. Thanks, Evgeny Astigeevich | Arm Compiler Optimization Team Lead

> -----Original Message----- > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of > Bharathi Seshadri via llvm-dev > Sent: Tuesday, September 05, 2017 8:24 PM > To: llvm-dev at lists.llvm.org > Subject: [llvm-dev] Lowering llvm.memset for ARM target > > As reported in an earlier thread > (http://clang-developers.42468.n3.nabble.com/Disable-memset-synthesis- > tp4057810.html), > we noticed in some cases that the llvm.memset intrinsic, if lowered to > stores, could help with performance. > > Here's a test case: If LIMIT is > 8, I see that a call to memset is > emitted for arm & aarch64, but not for x86 target. > > typedef struct { > int v0[100]; > } test; > #define LIMIT 9 > void init(test *t) > { > int i; > for (i = 0; i < LIMIT ; i++)_ _> t->v0[i] = 0; > } > int main() { > test t; > init(&t); > return 0; > } > > Looking at the llvm sources, I see that there are two key target > specific variables, MaxStoresPerMemset and MaxStoresPerMemsetOptSize, > that determine if the intrinsic llvm.memset can be lowered into store operations. > For ARM, these variables are set to 8 and 4 respectively. > > I do not know as to how the default values for these two variables are > arrived at, but doubling these values (similar to that for the x86 > target) seems to help our case and we observe a 7% increase in > performance of our networking application. We use -O3 and -flto and 32-bit arm. > > I can prepare a patch and post for review if such a change, say under > CodeGenOpt::Aggressive would be acceptable. > > Thanks, > Bharathi _> ________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



More information about the llvm-dev mailing list