On 03/12/2015 10:04 PM, Peter Levart       wrote:
    
           ... putLongUnaligned in the style of above getLongUnaligned is       more tricky with current code structure. But there may be a middle       ground (or a sweet spot):
      
      
          public final void putLongUnaligned(Object o, long offset, long       x) {
              if (((int) offset & 1) == 1) {
                  putLongParts(o, offset,
                      (byte) (x >>> 0),
                      (short) (x >>> 8),
                      (short) (x >>> 24),
                      (short) (x >>> 40),
                      (byte) (x >>> 56));
              } else if (((int) offset & 2) == 2) {
                  putLongParts(o, offset,
                      (short)(x >>> 0),
                      (int)(x >>> 16),
                      (short)(x >>> 48));
              } else if (((int) offset & 4) == 4) {
                  putLongParts(o, offset,
                      (int)(x >> 0),
                      (int)(x >>> 32));
              } else {
                  putLong(o, offset, x);
              }
          }
      
      
      ...this has the same number of branches, but less instructions.       You also need the following two:     
    
    At least on Intel (with -XX:-UseUnalignedAccesses) above code     (Unaligned2) is not any faster then your code (Unaligned) according     to a JMH random-access test. Neither is the reversal of if/else     branches (Unaligned1). Unaligned3 is switch-based variant (just get)     and is slowest. Your variant seems to be the fastest by a hair:
    
    Benchmark                               Mode   Samples              Score  Score error    Units
    
j.t.UnalignedTest.getLongUnaligned      avgt               5       16.375        0.837    ns/op
    
j.t.UnalignedTest.getLongUnaligned1     avgt               5       18.340        0.617    ns/op
    
j.t.UnalignedTest.getLongUnaligned2     avgt               5       16.784        0.969    ns/op
    
j.t.UnalignedTest.getLongUnaligned3     avgt               5       19.634        0.871    ns/op
    
j.t.UnalignedTest.putLongUnaligned      avgt               5       15.521        0.589    ns/op
    
j.t.UnalignedTest.putLongUnaligned1     avgt               5       16.676        1.042    ns/op
    
j.t.UnalignedTest.putLongUnaligned2     avgt               5       16.394        3.028    ns/op
    

    
    Regards, Peter
    
    Peter
    
  ">

(original) (raw)

Switches currently don't profile well (if at all) - John can shed more light on that as this came up on the compiler list a few weeks ago.

sent from my phone

On Mar 12, 2015 6:06 PM, "Peter Levart" <peter.levart@gmail.com> wrote:


On 03/12/2015 10:04 PM, Peter Levart wrote:
... putLongUnaligned in the style of above getLongUnaligned is more tricky with current code structure. But there may be a middle ground (or a sweet spot):


public final void putLongUnaligned(Object o, long offset, long x) {
if (((int) offset & 1) == 1) {
putLongParts(o, offset,
(byte) (x >>> 0),
(short) (x >>> 8),
(short) (x >>> 24),
(short) (x >>> 40),
(byte) (x >>> 56));
} else if (((int) offset & 2) == 2) {
putLongParts(o, offset,
(short)(x >>> 0),
(int)(x >>> 16),
(short)(x >>> 48));
} else if (((int) offset & 4) == 4) {
putLongParts(o, offset,
(int)(x >> 0),
(int)(x >>> 32));
} else {
putLong(o, offset, x);
}
}


...this has the same number of branches, but less instructions. You also need the following two:


At least on Intel (with -XX:-UseUnalignedAccesses) above code (Unaligned2) is not any faster then your code (Unaligned) according to a JMH random-access test. Neither is the reversal of if/else branches (Unaligned1). Unaligned3 is switch-based variant (just get) and is slowest. Your variant seems to be the fastest by a hair:

Benchmark Mode Samples Score Score error Units
j.t.UnalignedTest.getLongUnaligned avgt 5 16.375 0.837 ns/op
j.t.UnalignedTest.getLongUnaligned1 avgt 5 18.340 0.617 ns/op
j.t.UnalignedTest.getLongUnaligned2 avgt 5 16.784 0.969 ns/op
j.t.UnalignedTest.getLongUnaligned3 avgt 5 19.634 0.871 ns/op
j.t.UnalignedTest.putLongUnaligned avgt 5 15.521 0.589 ns/op
j.t.UnalignedTest.putLongUnaligned1 avgt 5 16.676 1.042 ns/op
j.t.UnalignedTest.putLongUnaligned2 avgt 5 16.394 3.028 ns/op


Regards, Peter

Peter