RFR: 8179444: AArch64: Put zero_words on a diet (original) (raw)

White, Derek Derek.White at cavium.com
Mon May 8 15:56:00 UTC 2017


Hi Andrew,

Looks OK then.

If I find a major performance issue on misaligned stp I'll propose a fix separately. In some HW I've seen a up to a 2x performance penalty on unaligned destinations in some memcpy-like code.

-----Original Message----- From: Andrew Haley [mailto:aph at redhat.com] Sent: Sunday, May 07, 2017 4:41 AM To: White, Derek <Derek.White at cavium.com>; Roland Westrelin <rwestrel at redhat.com>; hotspot-dev Source Developers <hotspot-dev at openjdk.java.net> Subject: Re: RFR: 8179444: AArch64: Put zero_words on a diet

Hi,

On 05/05/17 23:54, White, Derek wrote:

src/cpu/aarch64/vm/macroAssembleraarch64.cpp: - zerowords(reg, reg): - Comment mentions that it's used by the ClearArray pattern, but it's also used by generatefill().

OK, but there's no contradiction there. The comment just explains why zero_words must be small.

- The old zerowords(), via blockzero() and fillwords(), would align base to 16-bytes before doing stps, the new code doesn't. It may be worth conditionally pre-aligning if AvoidUnalignedAcesses is true. And could conditionally remove pre-aligning in zeroblocks.

But that'd bloat zero_words, surely. And I haven't see the new code running any slower on any hardware,

Line 4999: This is pre-existing, but it's confusing - could you rename "ShortArraySize" to "SmallArraySize"?

OK.

- zerowords(reg, int): - I don't see a great way to handle unaligned accesses without blowing up the code size. So no real request here. - It looks like worst case is 15 unaligned stp instructions. - With some benchmarking someday I might argue for calling this constant count version for smaller copies.

I don't understand this.

- Unless ClearArray only zero's from the first array element on - then we could guess if base is 16-byte unaligned by looking at the array header size.

- zerodcacheblocks():

- Suggest a new comment that mentions that it's only called from zeroblocks()? Or if it's meant to be more general then add comments about the requirements for base alignment, and that cnt has to be >= 2*zvalength.

OK.

- zeroblocks DOES align base to 16-bytes, so we don't need to check here? Or make it a runtime assert?

I did wonder about that, but left it in as a safety net because its cost is immeasurably small.

Thanks for such a detailed review.

Andrew.



More information about the hotspot-dev mailing list