Unsafe.{get,put}-X-Unaligned performance (original) (raw)

Peter Levart peter.levart at gmail.com
Thu Mar 12 21:04:50 UTC 2015


On 03/12/2015 08:29 PM, Peter Levart wrote:

On 03/12/2015 07:37 PM, Andrew Haley wrote: On 03/12/2015 05:15 PM, Peter Levart wrote: ...or are JIT+CPU smart enough and there would be no difference? C2 always orders things based on profile counts, so there is no difference. Your suggestion would be better for interpreted code and I guess C1 also, so I agree it is worthwhile.

Thanks, Andrew. What about the following variant (or similar with ifs in case switch is sub-optimal): public final long getLongUnaligned(Object o, long offset) { switch ((int) offset & 7) { case 1: case 5: return (toUnsignedLong(getByte(o, offset)) << pickPos(56, 0)) | (toUnsignedLong(getShort(o, offset + 1)) << pickPos(48, 8)) | (toUnsignedLong(getInt(o, offset + 3)) << pickPos(32, 24)) | (toUnsignedLong(getByte(o, offset + 7)) << pickPos(56, 56)); case 2: case 6: return (toUnsignedLong(getShort(o, offset)) << pickPos(48, 0)) | (toUnsignedLong(getInt(o, offset + 2)) << pickPos(32, 16)) | (toUnsignedLong(getShort(o, offset + 6)) << pickPos(48, 48)); case 3: case 7: return (toUnsignedLong(getByte(o, offset)) << pickPos(56, 0)) | (toUnsignedLong(getInt(o, offset + 1)) << pickPos(32, 8)) | (toUnsignedLong(getShort(o, offset + 5)) << pickPos(48, 40)) | (toUnsignedLong(getByte(o, offset + 7)) << pickPos(56, 56)); case 4: return (toUnsignedLong(getInt(o, offset)) << pickPos(32, 0)) | (toUnsignedLong(getInt(o, offset + 4)) << pickPos(32, 32)); case 0: default: return getLong(o, offset); } } ...it may have more branches, but less instructions in average per call. Peter

... putLongUnaligned in the style of above getLongUnaligned is more tricky with current code structure. But there may be a middle ground (or a sweet spot):

 public final void putLongUnaligned(Object o, long offset, long x) {
     if (((int) offset & 1) == 1) {
         putLongParts(o, offset,
             (byte) (x >>> 0),
             (short) (x >>> 8),
             (short) (x >>> 24),
             (short) (x >>> 40),
             (byte) (x >>> 56));
     } else if (((int) offset & 2) == 2) {
         putLongParts(o, offset,
             (short)(x >>> 0),
             (int)(x >>> 16),
             (short)(x >>> 48));
     } else if (((int) offset & 4) == 4) {
         putLongParts(o, offset,
             (int)(x >> 0),
             (int)(x >>> 32));
     } else {
         putLong(o, offset, x);
     }
 }

...this has the same number of branches, but less instructions. You also need the following two:

 private void putLongParts(Object o, long offset, byte i0, short 

i12, short i34, short i56, byte i7) { putByte(o, offset + 0, pick(i0, i7)); putShort(o, offset + 1, pick(i12, i56)); putShort(o, offset + 3, i34); putShort(o, offset + 5, pick(i56, i12)); putByte(o, offset + 7, pick(i7, i0)); }

 private void putLongParts(Object o, long offset, short i0, int i12, 

short i3) { putShort(o, offset + 0, pick(i0, i3)); putInt(o, offset + 2, i12); putShort(o, offset + 6, pick(i3, i0)); }

Regards, Peter

-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150312/1a96ed5f/attachment.html>



More information about the hotspot-compiler-dev mailing list