Bit set intrinsic (original) (raw)

B. Blaser bsrbnd at gmail.com
Mon Nov 5 21:21:02 UTC 2018


On Wed, 31 Oct 2018 at 15:51, B. Blaser <bsrbnd at gmail.com> wrote:

The last but not least, I implemented the c2 part (using the 8-bit AND/OR variant) to do sharper comparisons also on non-concurrent execution: http://cr.openjdk.java.net/~bsrbnd/boolpack/webrev.02/ With 10e6 iterations the lock latency seems to be more or less negligible and removing it would make the intrinsic about 10% faster than BitSet without synchronization.

Which actually seems to be due to the following missing ANDB/ORB patterns in x86_64.ad:

instruct andB_mem_rReg(memory dst, rRegI src, rFlagsReg cr) %{ match(Set dst (StoreB dst (AndI (LoadB dst) src))); effect(KILL cr);

ins_cost(150); format %{ "andb dst,dst, dst,src\t# byte" %} opcode(0x20); ins_encode(REX_breg_mem(src, dst), OpcP, reg_mem(src, dst)); ins_pipe(ialu_mem_reg); %}

instruct orB_mem_rReg(memory dst, rRegI src, rFlagsReg cr) %{ match(Set dst (StoreB dst (OrI (LoadB dst) src))); effect(KILL cr);

ins_cost(150); format %{ "orb dst,dst, dst,src\t# byte" %} opcode(0x08); ins_encode(REX_breg_mem(src, dst), OpcP, reg_mem(src, dst)); ins_pipe(ialu_mem_reg); %}

The next two lines:

  1. bits[index>>>3] |= (byte)(1 << (index & 7));
  2. bits[index>>>3] &= (byte)~(1 << (index & 7));

where assembled as: 1) 024 movsbl R8, [RSI + #16 + R10] # byte 02a movl R11, #1 # int 030 sall R11, RCX 033 movsbl R11, R11 # i2b 037 orl R11, R8 # int 03a movb [RSI + #16 + R10], R11 # byte 2) 024 movsbl R8, [RSI + #16 + R10] # byte 02a movl R11, #1 # int 030 sall R11, RCX 033 not R11 036 movsbl R11, R11 # i2b 03a andl R8, R11 # int 03d movb [RSI + #16 + R10], R8 # byte

instead of: 1) 024 movl R11, #1 # int 02a sall R11, RCX 02d movsbl R11, R11 # i2b 031 orb [RSI + #16 + R10], R11 # byte 2) 024 movl R11, #1 # int 02a sall R11, RCX 02d not R11 030 movsbl R11, R11 # i2b 034 andb [RSI + #16 + R10], R11 # byte

So, as first step, I would probably create a JBS issue and send out a RFR on hotspot-dev for this simple enhancement if there are no objections?

Any opinion is welcome.

Thanks, Bernard



More information about the compiler-dev mailing list