[9] RFR (S): 8161720: Better byte behavior for off-heap data (original) (raw)

John Rose john.r.rose at oracle.com
Thu Aug 25 20:25:57 UTC 2016


On Aug 25, 2016, at 11:00 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:

Would you mind adding some comments to byte2bool and bool2byte saying this is consistent with the behaviour in HotSpot e.g. that reads work for TBOOLEAN or JNI values, and writes normallize?

+1

Enclosed is some background, FTR.

— John

The JVM converts ints to booleans using two different conventions, byte testing against zero and truncation to least-significant bit.

The JNI documents specify that, at least for returning values from native methods, a Java boolean (T_BOOLEAN) value is converted to the value-set 0..1 by first truncating to a byte (0..255 or maybe -128..127) and then testing against zero. The present change (JDK-8161720) extends this behavior when loading a byte off-heap data, which is nice and consistent. Thus, Java booleans in non-Java data structures are by convention represented as 8-bit containers containing either zero (for false) or any non-zero value (for true).

(The choice of convention is not highly constrained, since C does not share the boolean type with Java. C data structures contain Java booleans only when they hide under other types.)

Meanwhile, Java booleans in the heap are also stored in bytes, but are strongly normalized to the value-set 0..1. If you happen to use Unsafe to load such an on-heap boolean as if it were off-heap, the compare-against-zero normalization will be a no-op.

(If the compiler can prove that in fact the load only applies to on-heap data, then the compare-against-zero can be elided. That's what Zoltan has done here at my request—thanks!)

(Note that Unsafe is carefully designed so that a single instruction can point, dynamically, to either on-heap or off-heap data. This allows certain hot loops to be devirtualized.)

People who look closely will notice that compilers (and MethodHandles) use a different convention for normalizing on-heap and in-JVM values, in which byte2bool(x) := (x & 1) ? true : false. This is compatible with the reverse bool2byte(x) := x ? 1 : 0, but allows slightly better code, since the single low bit gets copied through any number of back-and-forth conversions, with never any testing against zero.

(Testing against zero requires a little extra help from the CPU carry propagator, and thus an occasional extra instruction. Also, if a 32-bit value is converted from JVM stack to a boolean, it's better to truncate directly to the least significant bit, instead of truncating to a byte, and then testing those eight bits against zero. As I said,it's slightly better just to copy the bit around, and ignore the size of the container.)

So, for values that are part of pure bytecode execution, a boolean is defined as a one-bit field of a byte container. If an integral value is stored into a boolean variable, it is truncated in one step by discarding all but the least significant bit (LSB).

Since the Java type system does not allow free conversion between boolean and other types, this aspect of booleans does not usually show up, but the truncation is happening in there.

The JVM user can see boolean truncation in two places. First, MethodHandles.explicitCastArguments is specified to use truncation-to-LSB when converting numeric values to boolean. Second,f or directly generated bytecodes, certain bytecodes quietly mask off all but the LSB of a stacked value before passing it to a value with a type descriptor of "Z". These bytecodes include "ireturn", "putfield", and "iastore", so that clients of methods and data structures that produce booleans can be assured that these booleans will be clean (either 0 or 1).

(This surprises people sometimes, as does the fact that the JVM verifier primarily concentrates the distinctions among int/long/float/double/reference, and doesn't distinguish among the various subrange-types carried by int.)

The truncations in ireturn/putfield/iastore were added as a security fix to the JVM fairly recently (I won't comment on why that might have been) but they are a no-op for all Java code, and in fact for all honest bytecodes.

The bottom line of all this is that on-heap booleans are stored in normalized form, but off-heap booleans are not presumed to be so normalized.

Sometimes on-heap data is used under a different type, as when byte buffer viewing operations inspect the bytes of a byte array as if they were little-endian or big-endian longs. (Unsafe allows this also.) This is a useful feature, which is important to Project Panama, where native structures (think "struct stat") can be captured as bitwise snapshots in on-heap long arrays.

In those cases of on-heap type punning, when Unsafe.getBoolean grabs a boolean from a non-boolean on-heap variable (or part of a variable), it will normalize the loaded byte just as if it came from off-heap.

When a boolean is stored to a byte in this way, the JVM will be working with a pre-normalized value (since booleans on heap and in the JVM stack are strongly normalized), and will just store the byte without testing it. Of course, the apparent test (z?1:0) is optimized to a simple copy when the JIT can deduce that the tested value is normalized.

-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160825/2a342133/attachment-0001.html>



More information about the hotspot-compiler-dev mailing list