Wildcards -- Models 4 and 5 (original) (raw)

Remi Forax forax at univ-mlv.fr
Thu Jun 2 10:21:30 UTC 2016

Previous message: Compatibility goals
Next message: Species-static members vs singletons
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

There is another model (model 6), in order to support species, we need at runtime to have a way to represent them, so a static species can be stored in a location which is not along with the instance fields nor along with the static fields.

Actually for the VM, an instance is represented like this:

header ----> class ------ vtable1 field1 vtable2 field2 ... ... static fields

It can be a little different if the .class object and the class of an instance are two different objects (for the JIT it's better to have the class to be a constant pointer but java.lang.Class is a Java object that may need to be moved in memory).

Now, we want something like this header ----> species ----------> class ------ vtable1 field1 vtable2 field2 ... ... species fields

so at runtime the header of an object is not a class anymore but a species (the runtime representation of a species).

This allows the VM to answer to things like: obj instanceof ArrayList and everybody cheers ...

No, because with , obj instanceof ArrayList may work or not.

In fact there is little reason to allow a user to see species at runtime*,

it makes ArrayList reified sometimes, so with a Map<String, List> map, sometimes map.get("foo") will throw an exception, sometimes it will not (because the erasure at compile time) the VM insert a cast to List in front of the call to map.get(), depending if E is a String or an int, the behavior will be different.
The erasure concept (the runtime part) is entrenched in the mind of million developers, changing that is a recipe for disaster.

so IMO even if the VM reify species at runtime, a developer should not be able to see that, it's better to lure him to think that the erasure at runtime is done the same way in Java 10 that it is in Java 5.

This model has a cost at runtime, a checkcast/instanceof/arraystore to ArrayList may be polymorphic while it was monomorphic before**, or doing a dynamic typecheck requires a double indirection if the class is anyfied.

An for a wildcard, ArrayList<?> is mapped to ArrayList (from the runtime class point of view) as usual, so no big deal.

It's IMO a far better model just because from the user point of view, nothing changed.

regards, Rémi

you can access to a species field without seeing the species by itself. ** let suppose that ArrayList is effectively final here.

----- Mail original -----

De: "Brian Goetz" <brian.goetz at oracle.com> À: valhalla-spec-experts at openjdk.java.net Envoyé: Vendredi 20 Mai 2016 20:33:00 Objet: Wildcards -- Models 4 and 5

In the 4/20 mail “Wildcards and raw types: story so far”, we outlined our explorations for fitting wildcard types into the first several prototypes. The summary was: * Model 1: no wildcards at all * Model 2: A pale implementation of wildcards, with lots of problems that stem from trying to fake wildcards via interfaces * Model 3: basically the same as Model 2, except members are accessed via indy (which mitigated some of the problems but not all) The conclusion was: compiler-driven translation tricks are not going to cut it (as we suspected all along). We’ve since explored two other models (call them 4 and 5) which explore a range of options for VM support for wildcards. The below is a preliminary analysis of these options.

Reflection, classes, and runtime types While it may not be immediately obvious that this subject is deeply connected to reflection, consider a typical implementation of |equals()|: |class Box { T t; public boolean equals(Object o) { if (!(o instanceof Box)) return false; Box other = (Box) o; return (t == null && other.t == null) || t.equals(other.t); } } | Some implementations use raw types (|Box|) for the |instanceof| and cast target; others use wildcards (|Box<?>|). While the latter is recommended, both are widely used in circulation. In any case, as observed in the last mail, were we to interpret |Box| or |Box<?>| as only including erased boxes, then this code would silently break. The term “class” is horribly overloaded, used to describe the source class (|class Foo { ... }|), the binary classfile, the runtime type derived from the classfile, and the reflective mirror for that runtime type. In the past these existed in 1:1 correspondence, but no more — a single source class now gives rise to a number of runtime types. Having poor terminology causes confusion, so let’s refine these terms: * /class/ refers to a source-level class declaration * /classfile/ refers to the binary classfile * /template/ refers to the runtime representation of a classfile * /runtime type/ refers to a primitive, value, class, or interface type managed by the VM So historically, all objects had a class, which equally described the source class, the classfile, and the runtime type. Going forward, the class and the runtime type of an object are distinct concepts. So an |ArrayList| has a /class/ of |ArrayList|, but a /runtime type/ of |ArrayList|. Our code name for runtime type is /crass/ (obviously a better name is needed, but we’ll paint that bikeshed later.) This allows us to untangle a question that’s been bugging us: what should |Object.getClass()| return on an |ArrayList|? If we return |ArrayList|, then we can’t distinguish between an erased and a specialized object (bad); if we return |ArrayList|, then existing code that depends on |(x.getClass() == List.class)| may break (bad). The answer is, of course, that there are two questions the user can ask an object: what is your /class/, and what is your /crass/, and they need to be detangled. The existing method |getClass()| will continue to return the class mirror; a new method (|getCrass()|) will return a runtime type mirror of some form for the runtime type. Similarly, a class literal will evaluate to a class, and some other form of literal / reflective lookup will be needed for crass. The reflective features built into the language (|instanceof|, casting, class literals, |getClass()|) are mostly tilted towards classes, not types. (Some exceptions: you can use a wildcard type in an |instanceof|, and you can do unchecked static casts to generic types, which are erased.) We need to extend these to deal in both classes /and/ crasses. For |getClass()| and literals, there’s an obvious path: have two forms. For casting, we are mostly there (except for the treatment of raw types for any-generic classes — which we need to work out separately.) For instanceof, it seems a forced move that |instanceof Foo| is interpreted as “an instance of any runtime type projected from class Foo”, but we also would want to apply it to any reifiable type as well. Wildcard types In Model 3, we express a parameterized type with a |ParamType| constant, which names a template class and a set of type parameters, which include both valid runtime types as well as the special type parameter token |erased|. One natural way to express a wildcard type is to introduce a new special type parameter token, |wild|, so we’d translate |Foo| as |ParamType[Foo,wild]|. In order for wildcard types to work seamlessly, the minimum functionality we’d need from the VM is to manage subtyping (which is used by the VM for |instanceof|, |checkcast|, verification, array store checks, and array covariance.) The wildcard must be seen to be a “top” type for all parameterizations: |ParamType[Foo,T] <: ParamType[Foo,wild] // for all valid T |_ _And, wildcard parameterizations must be seen to be subtypes of of their_ _wildcard-parameterized supertypes. If we have_ _|class Foo extends Bar implements I { ... } class Moo extends Goo { } | then we expect |ParamType[Foo,wild] <: ParamType[Bar,wild] ParamType[Foo,wild] <:_ _ParamType[I,wild] ParamType[Moo,wild] <: Goo |_ _Wildcards must also support method invocation and field access to the_ _members that are in the intersection of the members of all_ _parameterizations (these are the total members (those not restricted to_ _particular instantiations) whose member descriptors do not contain any_ _type variables.) We can continue to implement member access via_ _invokedynamic (as we do in Model 3, or alternately, the VM can support_ _|invoke*| bytecodes on wildcard receivers.)_ _We can apply these wildcard behaviors to any of the wildcard models_ _(i.e., retrofit them onto Model 2/3.)_ _Partial wildcards_ _With multiple type variables, the rules for wildcards generalize_ _cleanly, but the number of wildcard types that are a supertype of any_ _given parameterized type grows exponentially in the number of type_ _variables. We are considering adopting the simplification of erasing all_ _partial wildcards in the source type system to a total wildcard in the_ _runtime type system (the costs of this are: some additional boxing on_ _access paths where boxing might not be necessary, and unchecked casts_ _when casting a broader wildcard to a narrower one.)_ _Model 4_ _A constraint we are under is: existing binaries translate the types_ _|Foo| (raw type), |Foo| (erased parameterization), and |Foo<?>| all as |LFoo;| (or its equivalent, |CONSTANTClass[Foo]|); since existing code treats this as meaning an erased class, the natural path would be to continue to interpret |LFoo;| as an erased class. Model 4 asks the question: “can we reinterpret legacy |LFoo;| in classfiles, and |Foo<?>| in source files, as |any Foo|“ (restoring the interpretation of |Foo<?>| to be more in line with user intuition.) Not surprisingly, the cost of reinterpreting the binaries is extensive. Many bytecodes would have to be reinterpreted, including |new|, |{get,put}field|, |invoke*|, to make up the difference between the legacy meaning of these constructs and the desired new meaning. Worse, while boxing provides us a means to have a common representation of signatures involving |T| (T’s bound), in order to get to a common representation for signatures involving |T[]|, we’d need to either (a) make |int[]| a subtype of |Object[]| or (b) have a “boxing conversion” from |int[]| to |Object[]| (which would be a proxy box; the data would still live in the original |int[]|.) Both are intrusive into the |aaload| and |aastore| bytecodes and still are not anomaly-free. So, overall, while this seems possible, the implementation cost is very high, all of which is for the sake of migration, which will remain as legacy constraints long after the old code has been migrated. Model 5 Model 5 asks the simpler question: can we continue to interpret |LFoo;| as erased in legacy classfiles, but upgrade to treating |Foo<?>| as is expected in source code? This entails changing the compilation translation of |Foo<?>| from “erased foo” to |ParamType[Foo,wild]|. This is far less intrusive into the bytecode behavior — legacy code would continue to mean what it did at compile time. It does require some migration support for handling the fact that field and method descriptors have changed (but this is a problem we’re already working on for managing the migration of reference classes to value classes.) There are also some possible source incompatibilities in the face of separate compilation (to be quantified separately). Model 5 allows users to keep their |Foo<?>| and have it mean what they think it should mean. So we don’t need to introduce a confusing |Foo| wildcard, but we will need a way of saying “erased Foo”, which might be |Foo<? extends Object>| or might be something more compact like |Foo|. Comparison Comparing the three models for wildcards (2, 4, 5): * Model 2 defines the source construct |Foo<?>| to permanently mean |Foo|, even when |Foo| is anyfied, and introduces a new wildcard |Foo| — but maintains source and binary compatibility. * Model 4 let’s us keep |Foo<?>|, and retroactively redefines bytecode behavior — so an old binary can still interoperate with a reified generic instance, and will think a |Foo| is really a |Foo|. * Model 5 redefines the /source/ meaning of |Foo<?>| to be what users expect, but because we don’t reinterpret old binaries, allows some source incompatibility during migration. I think this pretty much explores the solution space. Our choices are: break the user model of what |Foo<?>| means, take a probably prohibitive hit to distort the VM to apply new semantics to old bytecode, or accept some limited source incompatibility under separate compilation but rescue the source form that users want. In my opinion, the Model 5 direction offers the best balance of costs and benefits — while there is some short-term migration pain (in relatively limited cases, and can be mitigated with compiler help), in the long run, it gets us to the world we want without permanently burdening either the language (creating confusion between |Foo<?>| and |Foo|) or the VM implementation. In all these cases, we still haven’t defined the semantics of /raw types/. Raw types existed for migration between pre-generic and generic code; we still have that migration problem, plus the new migration problems of generic to any-generic, and of pre-generic to any-generic. So in any case, we’re going to need to define suitable semantics for raw types corresponding to any-generic classes.

Previous message: Compatibility goals
Next message: Species-static members vs singletons
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the valhalla-spec-observers mailing list