Moving from VVT to the L-world value types (LWVT) (original) (raw)

Remi Forax forax at univ-mlv.fr
Fri Jan 19 21:10:39 UTC 2018


I think there is an alternative encoding for Q types which is worth to consider, see below ...

----- Mail original -----

De: "Frederic Parain" <frederic.parain at oracle.com> À: "valhalla-dev" <valhalla-dev at openjdk.java.net> Envoyé: Mardi 16 Janvier 2018 21:56:11 Objet: Moving from VVT to the L-world value types (LWVT)

Here’s an attempt to bootstrap the L-world exploration, where java.lang.Object is the top type of all value classes (as discussed during the November meetings in Burlington).

This proposal tries to evolve the JVMS with a small set of changes to have an implementable specification of the L-world. Instead of trying to add Q/R/U-types to the JVMS, the approach is to expend the JVMS notion of “reference” to cover both regular classes and value classes. The notion of “class” has also be extended to cover both, but when needed, it is possible to specify an “object class” or a “value class”, or respectively, “an instance of an object class” vs “an instance of a value class”. The “Q…;” format is still used for value class types, but the “;Q” trick is gone.

The VM needs to know if a class is a value type or no at the time we compile something. Q-type is a way to encode that be that's not the only way.

The attach document contains sections of the JVMS that have been modified to implement the L-world. The text doesn’t have change bars, so people are encouraged to read each modified section entirely to see if it is consistent to cover all cases of the L-world.

Here’s a quick summary of the changes with some consequences on the HotSpot code: - all v-bytecodes are removed except vdefault and vwithfield - all bytecodes operating on an object receiver are updated to support values as well, except putfield and new - single carrier type for both instances of object classes and instances of value classes - this carrier type maps to the TOBJECT BasicType - TVALUETYPE still exists but its usage is limited (same purpose as TARRAY) - qtos TosState is removed - JNI: the jobject type can be used to carry either a reference to an object or an array or a value. The type jvaluetype, sub-type of jobject, is used when only a value class instance is expected - Q…; remains the way to encode value classes in signature (fields and methods) - In the constant pool, the CONSTANTCLASSinfo entry type is used to store a symbolic reference to either an object class or a value class - the ;Q escape sequence is not used anymore in value class names

agree, let's remove ;Q encoding, it was a hack.

One important point of this exercise is to ensure that the migration of Value Based Classes into Value Classes is possible, and doable with a reasonable complexity and costs. In addition to the JVMS update (and consistent with the JVMS modifications), here’s a set of proposals on how to deal with the VBC migration. Migration of Value Based Classes into Value Classes: - challenges: - signature mismatch - null - change in behavior

- proposal for signature mismatch: - with LWVT, value class types in signatures are using the Q…; format - legacy code is using signature with L…; format (because VBC are object classes) - methods will have two signatures: - true signature, which could include Q…; elements - a L-ified signature where all Q…; elements are re-written with the L…; format - method lookup still works by signature string comparisons - the signature of the method being looked up will compared against both the true and the L-ified signatures, if the looked up signature matches the L-ified signature but not the true signature, it means a situation where legacy code is trying to invoke migrated code has been detected, and additional work might be required for the invocation (actions to be taken have to be defined) - signature mismatch can also occur for fields, this is still being investigating, the proposal will be updated as soon as we have a solution ready to be published

legacy code => legacy classes, each of them at a different states, so you will end up with more than two signatures.

suppose you have two classes V and W that will becomes value types, but they are from different module managed by different companies, you can have 3 classes like this class A { LW; m(LV;) } class B extends A { LW; m(LV;) } class C extends B { LW; m(LV;) }

then you migrate B to declare V as a value type class B extends A { LW; m(QV;) } then you migrate A to declare W as a value type class A { QW; m(LV;) } here, there are more than two true signatures.

Let's take a step back, in Java (the language), we have already introduced features like generics or varargs that behave the same way, the only difference here is that this is something that the VM has to deal with and not something the java compiler has to deal with.

How varargs works in the java compiler, it's easy, the method descriptor is an array, and the access modifier say if its a varargs or not, i think we should encode Q-type the same way. As i said, for a class, we want to know at the time this class was compiled, which types used by this class is a value type.

So i propose to introduce a new class attribute named ValueTypes that contains the set of all value types that are used by that class, so for the VM a type is by default a reference type (a L-type) apart if the type is listed in the attribute ValueTypes, in that case it's a value type. Basically, having a bit for each type at class level that say if a type behave as a value type or not when the class was compiled.

In term of implementation,

- proposal for null references leaking to migrated code - having a null reference for a Value Based Class variable or field is valid in legacy code but it becomes invalid when the Value Based Class has been migrated to a Value Class - trying to prevent all references with a value class type to get a null value would be very expensive (it would require to look at the stackmap for each assignment to a local variable) - the proposed solution is to allow null references for local variable and expression stack slots, but forbid them for fields or array elements (bytecodes operating on fields and array have to be updated to throw a NPE whenever a null reference is provided instead of a value class instance) - null references are likely to be an issue for JIT optimizations like passing values in registers when a method is invoked. The proposed solution is to only allow null references for value classes in legacy code, by detecting them and blocking them when leaking to migrated code. The detection can be done at invocation time, when a mismatch between the signature expected by the caller and the real signature of the callee is detected (see signature mismatch proposal above) - the null reference should also be detected and blocked when it is used as a return value and the type of the value to be returned is a value class type

I believe what i'm proposing above match all these points.

In addition to the JVMS update, here’s a chart trying to summarize the new checks that will have to be added to existing bytecode when moving the vbytecodes semantic in to a* bytecodes. The categories in the chart are not very precise, but we can use it as a starting point for our discussions. The chart can also help defining which experiments could be done to estimate the costs of the different additional checks needed to be added to existing bytecodes. All these are preliminary works for a proposal to implement the L-world value types and not a definitive specification. This has to be analyzed and discussed before any attempt to implement it starts. Feel free to send feedback, comments, other proposals, etc. Thank you, Fred

Rémi



More information about the valhalla-dev mailing list