[lworld] Handling of missing ValueTypes attributes (original) (raw)

John Rose john.r.rose at oracle.com
Thu Jul 19 22:13:42 UTC 2018

Previous message (by thread): [lworld] Handling of missing ValueTypes attributes
Next message (by thread): [lworld] Handling of missing ValueTypes attributes
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Jul 13, 2018, at 2:45 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:

Hi Karen, On 11.07.2018 21:42, Karen Kinnear wrote: 2a. flattening in containers - flattenable fields and array - require check of value type vs. ACTUAL loaded type - this is easy, we preload types Yes, no issues here.

2b. JIT scalarization of field access - must be an ACTUAL value type and must be flattenable This will only work for fields that the JIT’d caller believes are value types, the declarer believes are value types and the declarer does an ACTUAL check Need caller-callee agreement for a JIT’d caller. I don't understand what you mean by caller/callee in the context of a field access? Also, I'm not sure what you mean by "scalarization of field access"?

Here's an obtuse answer, although I'm probably missing the acuteness of your point:

(caller : callee : method) :: (accessing object : containing object : field)

The representation choices, and the negotiation of a shared understanding of those choices, are similar on the two sides of the "::".

A resolved field consists of an offset (small integer) into its container, plus an indication of its type (if necessary).

Scalarization of field access means that the resolved field is loaded or stored as contiguous subfields. A non-scalarized value-type field is secretly stored as a buffer pointer. Resolution of an accessing class's field access involves a check to determine the true status of the field in the containing class. To avoid crashes, we must either throw an error if resolution detects a mismatch, or prepare the accessing class to access the field in its correct format. In LW1 we take the former option, which is very strict but fine for initial experimentation.

The rest of this note is squarely post-LW1; here goes:

Several different formats can be conceived of for value type fields:

inline-struct: a contiguous block of memory directly inside the container, aligned and/or padded
inline-exploded: components puzzled in wherever they fit in the containing object (or registers)
word-wise-exploded: same as inline-struct, but with successive word images separately register-allocated (used by ABIs)
object-like-ref: a reference in the container (e.g., compressed oop), indirecting to a hidden buffer node in the shared heap
thread-local-ref: a reference in a thread-confined variable (register), indirected to a thread-local heap node
ref-to-federated: a reference in the container, indirecting to a hidden buffer area contiguous with the containing object
off-heap-ref: a reference in the container, indirecting to a hidden buffer area not in the main heap
inline-with-souvenir-ref: components stored inline, plus optional (nullable) reference to a shareable copy

The techniques separate broadly into a choice between inlined components vs. (hidden) references to buffered values. There are other choices after that.

IR graphs of course mainly use the exploded format, allocating component values independently to registers, stack, and/or nowhere. The exploded format is also good for out-of-line calling sequences.

The souvenir format uses the most storage, but is very helpful in IR graphs, since it facilitates conversion between exploded and buffered forms. It is also sometimes used in calling sequences, for the same reason.

In memory, the exploded format minimizes fragmentation overhead by treating sub-fields exactly like fields, reordering them so that bytes are next to bytes, longs next to longs, etc., without regard for the boundaries of distinct values stored within the same container. The trade-off is that the fields must be separately located in the container, which means the CP cache must be equipped with multiple offsets, perhaps one for each value type field. It also makes hash of the Unsafe API. But if we want maximum packing, we could work the details, without changing any higher-level APIs or specifications.

The exploded format also has potential for highly packed flat arrays, since fragmentation overheads are multiplied by the length of the array, and thus potentially worth reducing. A two-tiered flat array would distribute blocks of objects tightly packed into cache lines, packing each block the same way, but as tightly as possible, as if the block were an object containing a fixed number of values. Indexing would compute first the block (using a divide) and then the index within the block (using a modulo). Individual fields would picked up a varying computed offsets within the block.

We aren't considering ref-based storage other than object-like and thread-local, but I put them there for the record. The GC might be able to eliminate headers and retain locality with a federated format. It could do this secretly and at its own discretion, like we do today for thread-locals. Perhaps it would federate when the buffered value is unique to its container. But the complexity would be high, and historically we'vehad plenty of trouble, and enough payoff, implementing the simpler techniques.

Any ref-based format has the physical potential to hold a null, or to be part of a reference cycle. The JVM should (a) enforce invariants that exclude such things, and (b) be robust if a bug breaks such an invariant. Probably value types will be able to contain cycles, but only through explicit Java references. Consider a buffered value that has an Object field that just happens to refer back to the same value. There's no way to exclude such cycles systematically.

Nulls can be systematically excluded from "secret" references, by simply taking special action when a null is observed, before a bytecode can get hold of it. Two special actions are relevant: (1) Throw NPE, and (2) substitute a defaultvalue. The first is useful when legacy code might have sent a null, and we decide that the user model must exclude this with an exception. The second is also useful with legacy code, in the more unlikely case where we decide to substitute the default value for null. (This is sometimes a requested feature, but not necessarily one we should agree to pay for.) The second is useful for bootstrapping, sometimes: If a hidden reference is part of an object's layout, and the object is initialized to all-zero-bits, then we want to bootstrap the hidden reference to a non-zero pointer to its buffered default value. But if we can't reliably do that in all cases, then any read of that field must be prepared to zero-check the field and substitute the missing default. Perhaps we can avoid this in all cases, but I think we currently use this trick, to simplify object initialization. We might also need it to bootstrap the defaultvalue bytecode itself, by lazily creating the canonical buffered default the first time it is actually used.

If compiled code loads a value type from a field (no matter if the field is flattenable or not), it will scalarize the value type if it's not null (i.e., pass it on in registers or on the stack). This relies on the fact that at compile time, the field type is loaded if it's a value type.

This is a case where the accessing/using/calling code wants a value, and intends to immediately explode it into components. In that case, if the container/used/callee object contains a reference, any null must be handled by (1) or (2) above, depending on user model. If the null is supposed to be impossible, the JVM should probably still fail gracefully if the bad thing appears.

If the caller and callee differ on value-ness, then we have a tug-of-war between opinions. For starters, we should do what's convenient if the problem arises, and document it. Those are the points where we need to elevating the first implementation to a robust user model (post-LW1).

2c. JIT calling convention - scalarization of arguments Need either the caller-callee in agreement if both compiled OR For caller calls by reference, adapter that can scalarize arguments it knows are ACTUAL value types Today adaptor is created at callee link time, so we explicitly load types in local methods in the ValueTypes attribute so they can be scalarized Yes, but we don't support that for LW1 because there are still lots of other issues to sort out before we can re-enable -XX:+ValueTypePassFieldsAsArgs (for example, the issue with lambda forms and linkTo* calls).

The LFs and linkTo calls will have to use buffered values uniformly. That implies that the linkTo calls will have to be given the ability to scalarize on the fly. This probably implies pre-generated adapters. Post-LW1 (or intra-LW1 at best) if we don't have it now. I don't know a better approach, though I think Roland and I were on the verge of something in this discussion:

http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-May/004273.html

One idea would be to expand ref-based calling sequences in place by exploding into additional argument positions, obtained by sneaking more space from the caller's top-of-stack area.

We could really use a handshake which allows a callee to borrow more space in the caller's argument list, but secretly pay it back on return. This is necessary for tail-calls also: The chain of tail calls can require an unbounded amount of extra TOS to store parameters of the various tail calls, all without the cooperation of the original non-tail caller of the tail-call chain. A good problem to think about when not otherwise occupied.

2d. JIT returning a value type I do not know our plans for value type return optimizations. The plan is to re-enable -XX:+ValueTypeReturnedAsFields for lworld once we have sorted out the calling convention issues.

+100

The adaptor for returns are stored off of the return type, so they know the ACTUAL value. Returns do not use any adapters but we do some kind of handshaking between the caller and the callee to make sure that they agree on the type (see page 26/27 of [1]). In general we can check caller-callee consistency so we can be in agreement about whether a type is a value type. The exception is the JavaCalls::callhelper path used by Reflection, jni (and others internally) - I assume we will always return a reference here (I have not studied the details yet, so I don’t know where that is handled) I'm not sure about the reflection/jni case but the short answer would be "we don't support scalarization of the return value for LW1". And I don't see any problems with the current code.

Like the linkTo* methods, JNI access will have to adapt exploded (scalarized) arguments and return values to buffered (ref) ones.

If we can solve the TOS-borrowing problem (described above), we can arrange some pretty reasonable adapters that can shift between ref-based and exploded formats.

For bonus points, consider designing a calling sequence which includes souvenirs. They are useful, and the exercise might simplify the adapter logic: When going from references-only to exploded-with-souvenirs, you don't change any reference arguments at all, just push in the exploded components after them in the argument list. (This works fine because of the way we allocate calling sequences left-to-right.)

Going from exploded-with-souvenirs to references-only is a no-op (actually, a tail-call) since the callee can just ignore the extra trailing arguments. Actually, any null souvenirs would have to be fixed up to hold non-null references to buffered values, just like in local IR.

Thus, one calling sequence subsumes the other. This might make LF-based calls (using linkTo*) easier to generate, since they would just send exploded-with-souvenir arguments, and let the callee decide independently whether to ignore or use the components.

As Roland pointed out, one hard part about on-the-fly explosion is borrowing the extra TOS from the caller, but in the case of LFs that could be hardwired, since the full size is known before the call.

HTH

— John

Details: 1. MethodHandles - invocation and field access always goes through LinkResolver at this point. There are two exceptions here: - one is when the MethodHandle creation does NOT pass in the calling class information - in that case there is no check for caller-callee consistency, we need to look at this independently - one is invokespecial indirect superclass (ACCSUPER) which performs selection in the java code. - That is a rathole I won’t follow here - we should fix that anyway - multiple potential approaches.

2. Reflection: optimized reflection generates bytecodes, so goes through bytecode path, so goes through LinkResolver. initial reflection calls JavaCalls::call->JavaCalls::callhelper 3. JNI: also goes through JavaCalls::callhelper JavaCalls::callhelper calls callstub to invoke the entrypoint which is: normally: method->frominterpretedentry debug: method->interpreterentry For argument passing, my assumption is that we are ok with the JavaCalls::callhelper path because it always passes by reference and uses the callee adapter from interpreter which knows the declared value types that can be scalarized. So the same adaptor that works for interpreted code works for callhelper where the caller always assumes everything is a reference and passes by reference. JIT folks - does this work in practice? Yes, that seems reasonable but it's very hard to figure this all out by code inspection. I think what we need is more tests to find bugs and/or gain confidence in our current design/implementation. That said, current JIT optimizations do not rely on the value types attribute but of course there might be bugs or implicit assumptions that do not hold. However, Ioi's optimization (8206140) relies on the fact that an interpreted callee always knows when it's returning null for a value type (and can then deoptimize the compiled caller). It seems that the attribute consistency checks cannot guarantee that but I need to take a closer look. My take on this is to defer all optimizations that rely on the consistency checks to after we got these right (and it's okay if that is after LW1). Best regards, Tobias [1] http://cr.openjdk.java.net/~thartmann/talks/2018-ValueTypesCompilerOffsite.pdf

Previous message (by thread): [lworld] Handling of missing ValueTypes attributes
Next message (by thread): [lworld] Handling of missing ValueTypes attributes
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the valhalla-dev mailing list