(original) (raw)
Switches currently don't profile well (if at all) - John can shed more light on that as this came up on the compiler list a few weeks ago.
sent from my phone
On Mar 12, 2015 6:06 PM, "Peter Levart" <peter.levart@gmail.com> wrote:
On 03/12/2015 10:04 PM, Peter Levart wrote:
... putLongUnaligned in the style of above getLongUnaligned is more tricky with current code structure. But there may be a middle ground (or a sweet spot):
public final void putLongUnaligned(Object o, long offset, long x) {
if (((int) offset & 1) == 1) {
putLongParts(o, offset,
(byte) (x >>> 0),
(short) (x >>> 8),
(short) (x >>> 24),
(short) (x >>> 40),
(byte) (x >>> 56));
} else if (((int) offset & 2) == 2) {
putLongParts(o, offset,
(short)(x >>> 0),
(int)(x >>> 16),
(short)(x >>> 48));
} else if (((int) offset & 4) == 4) {
putLongParts(o, offset,
(int)(x >> 0),
(int)(x >>> 32));
} else {
putLong(o, offset, x);
}
}
...this has the same number of branches, but less instructions. You also need the following two:
At least on Intel (with -XX:-UseUnalignedAccesses) above code (Unaligned2) is not any faster then your code (Unaligned) according to a JMH random-access test. Neither is the reversal of if/else branches (Unaligned1). Unaligned3 is switch-based variant (just get) and is slowest. Your variant seems to be the fastest by a hair:
Benchmark Mode Samples Score Score error Units
j.t.UnalignedTest.getLongUnaligned avgt 5 16.375 0.837 ns/op
j.t.UnalignedTest.getLongUnaligned1 avgt 5 18.340 0.617 ns/op
j.t.UnalignedTest.getLongUnaligned2 avgt 5 16.784 0.969 ns/op
j.t.UnalignedTest.getLongUnaligned3 avgt 5 19.634 0.871 ns/op
j.t.UnalignedTest.putLongUnaligned avgt 5 15.521 0.589 ns/op
j.t.UnalignedTest.putLongUnaligned1 avgt 5 16.676 1.042 ns/op
j.t.UnalignedTest.putLongUnaligned2 avgt 5 16.394 3.028 ns/op
Regards, Peter
Peter