JEP 401: Value Classes and Objects (Preview) (original) (raw)
Summary
Enhance the Java Platform with value objects, class instances that have only final
fields and lack object identity. This is a preview language and VM feature.
Goals
- Allow developers to opt in to a programming model for simple values in which objects are distinguished solely by their field values, much as the
int
value3
is distinguished from theint
value4
. - Migrate popular classes that represent simple values in the JDK, such as
Integer
, to this programming model. Support compatible migration of user-defined classes. - Maximize the freedom of the JVM to encode simple values in ways that improve memory footprint, locality, and garbage collection efficiency.
Non-Goals
- It is not a goal to introduce a
struct
feature in the Java language. Java continues to operate on just two kinds of data: primitives and objects. - It is not a goal to change the treatment of primitive types. Primitive types behave like value classes in many ways, but are a distinct concept. A separate JEP will provide enhancements to make primitive types more class-like and compatible with generics.
- It is not a goal to guarantee any particular optimization strategy or memory layout. This JEP enables many potential optimizations; only some will be implemented initially. Future JEPs will pursue optimizations related to
null
exclusion and generic specialization. - It is not a goal to automatically treat existing classes as value classes, even if they meet the requirements for how value classes are declared and used. The behavioral changes require an explicit opt-in.
- It is not a goal to "fix" the
==
operator so that programmers can use it in place ofequals
. This JEP redefines==
only as much as necessary to cope with a new kind of identity-free object. The usual advice to compare objects in most contexts using theequals
method still applies.
Motivation
Java developers often need to represent simple domain values: the shipping address of an order, a log entry from an application, and so on. To do this, developers typically declare classes whose main purpose is to "wrap" data, stored in final
fields. For example, a simple RGB color value could be represented with a record, whose fields arefinal
by default:
var orange = new Color(237, 139, 0);
var blue = new Color(0, 115, 150);
record Color(byte red, byte green, byte blue) {
public Color(int r, int g, int b) {
this(checkByte(r), checkByte(g), checkByte(b));
}
private static byte checkByte(int x) {
if (x < 0 || x > 255) throw new IllegalArgumentException();
return (byte) (x & 0xff);
}
// Provided automatically: red(), green(), blue(),
// toString(), equals(Object), hashCode()
public Color mix(Color that) {
return new Color(avg(red, that.red),
avg(green, that.green),
avg(blue, that.blue));
}
private static byte avg(byte b1, byte b2) {
return (byte) (((b1 & 0xff) + (b2 & 0xff)) / 2);
}
}
Developers will regard the "essence" of a Color
object as a red-green-blue triple, but to Java, the essence of an object is its identity. Each execution of new Color(...)
creates an object with a unique identity, making it distinguishable from every other object in the system. An object's identity means that developers can share references to an object between different parts of a program, and changes to an object's fields in one part of the program can be observed in other parts.
Object identity is problematic for simple domain values
Object identity is at best irrelevant and at worst harmful to simple domain values.
The ==
operator can be used to compare object identities. While normal programs should avoid using this operator, those that do so will observe that two objects with the same "essence" may have distinct identities. For example, two Color
objects that represent the same red-green-blue triple are not ==
if they were created by different executions of new Color(...)
. This inconsistency is a frequent source of confusion for developers.
var c = new Color(255, 0, 0);
var d = c.mix(c); // creates a new Color for the same red-green-blue triple
if (c == d) ... // false, even though c.equals(d)
Confusion around ==
for objects is so widespread that Java gives special treatment to objects of fundamental classes:
- String literals areinternedautomatically. This means that a string literal with a given character sequence always produces the same
String
object, no matter where the string literal is used. For example, givenString s = "hello";
andString t = "hello";
, only oneString
object for"hello"
is created, sos == t
is true. - Small integer literals areautoboxedin a predictable way. This means that a given integer literal always produces the same
Integer
object, no matter where the integer literal is used. For example, givenInteger x = 5;
andInteger y = 5;
, only oneInteger
object for5
is created, sox == y
is true.
This special treatment minimizes the role of object identity for string literals and integer literals, but fails to address the confusion around ==
for strings and integers in general. Themost viewed Java question on StackOverflowconcerns the use of ==
with String
objects, andanother high-visibility questionconcerns the use of ==
with Integer
objects.
This sort of confusion could be avoided for simple domain values if the language did not insist that separately-created objects with the same "essence" have distinct identities.
Object identity is expensive at run time
Java's requirement that every object has identity, even if simple domain values don't want it, means worse performance. Typically, the JVM has to allocate memory for each newly created object, distinguishing it from every object already in the system, and reference that memory location whenever the object is used or stored. This causes the garbage collector to work harder, taking cycles away from the application, and it means worselocality of reference—for example, an array may refer to objects scattered around memory, frustrating the CPU cache as the program iterates over the array.
Modern JVMs have an optimization calledescape analysisthat can mitigate these performance concerns. For example, instead of allocating memory for a Color x
with three byte
fields, the JVM can pass the three byte
values around the program directly. An inlined call tox.mix(...)
could run without any memory being allocated, even though the mix
method performs new Color(...)
. This optimization is valid as long as the code never depends on the identity of the object in question. Unfortunately, if the program performs an identity-sensitive operation such as x == y
, or if the object might "escape" into code that the optimization can't observe, the optimization must be unraveled.
In some application domains, developers routinely program for speed by creating as few objects as possible, thus de-stressing the garbage collector and improving locality. For example, they might encode their RGB colors as threebyte
values rather than as Color
objects. Unfortunately, this approach gives up the functionality of classes that makes Java code so maintainable: meaningful names, private state, data validation by constructors, convenience methods, etc. A developer operating on colors represented as byte
values might accidentally interpret the bits with a BGR encoding, swapping the red and blue components and corrupting the resulting image.
Programming without identity
Trillions of Java objects are created every day, each one bearing a unique identity. We believe the time has come to let Java developers choose which objects in the program need identity, and which do not. A class like Color
that represents simple domain values could opt out of identity, so that there would never be two distinct Color
objects representing the HTML color purple, just as there are never two distinct int
values that both represent the number 4
.
By opting out of identity, developers are opting in to a programming model that provides the best of both worlds: the abstraction of classes with the simplicity and performance benefits of primitives.
Important classes in the JDK, such as the wrapper classes used for boxing, are already designed to be "value-based", meaning they discourage depending on the identity of instances. With this JEP, these classes can opt out of identity entirely. For example, in the case of the class Integer
, instances will have no identity, ==
will compare all Integer
objects by value, and the run-time overhead of the Integer
type can dramatically shrink. Even when stored in arrays, Integer[]
can approach the efficiency of int[]
.
Description
A value object is an object that does not have identity. A value object is an instance of a value class. Two value objects are the same according to ==
if they have the same field values, regardless of when or how they were created. Two variables of a value class type may hold references that point to different memory locations, but refer to the same value object—much like two variables of type int
may hold the same int
value.
An identity object is an object that does have identity—a unique property associated with the object when it is created. Prior to value classes, every object in Java was an identity object. Two identity objects are the same according to ==
if they have the same identity. Two variables of an identity class type refer to the same identity object only if they hold references pointing to the same memory location.
At run time, the use of value objects may be optimized in ways that are difficult or impossible for identity objects. This is because value objects, untethered from any canonical memory location, can be duplicated, re-encoded, or re-used whenever it is convenient for the JVM to do so. This freedom allows for smaller memory footprint, fewer memory allocations, and better data locality.
Existing classes that represent simple domain values and that have followed best practices to avoid identity dependencies can be easily migrated to be value classes, with minimal compatibility impact. This JEP migrates a handful of commonly-used classes in the Java Platform, including the primitive wrapper classes such as Integer
.
Enabling preview features
Value classes are a preview language feature, disabled by default.
To try the examples below in JDK NN you must enable preview features:
- Compile the program with
javac --release NN --enable-preview Main.java
and run it withjava --enable-preview Main
; or, - When using the source code launcher, run the program with
java --enable-preview Main.java
; or, - When using jshell, start it with
jshell --enable-preview
.
Programming with value objects
Programs create value objects by instantiating a class that has been declared with the value
modifier. In most respects, value objects behave just like any other object, but there are some special behaviors that programmers should be aware of.
Value classes
A class that has no need for identity-related features can opt out of those features with the value
modifier. Classes with the value
modifier are value classes; classes without the modifier are identity classes.
The Color
record introduced earlier could be declared a value record. Nothing else about the declaration changes.
value record Color(byte red, byte green, byte blue) {
public Color(int r, int g, int b) {
this(checkByte(r), checkByte(g), checkByte(b));
}
private static byte checkByte(int x) {
if (x < 0 || x > 255) throw new IllegalArgumentException();
return (byte) (x & 0xff);
}
public Color mix(Color that) {
return new Color(avg(red, that.red),
avg(green, that.green),
avg(blue, that.blue));
}
private static byte avg(byte b1, byte b2) {
return (byte) (((b1 & 0xff) + (b2 & 0xff)) / 2);
}
}
A simple class representing US dollar currency values (to two decimal places) might also be a good value class candidate. In this case, the author might prefer to declare a regular (non-record) class to more closely control the internal state. But because the class does not depend on identity-sensitive features like unique instance creation, field mutation, or synchronization, it can be declared a value class.
value class USDCurrency implements Comparable<USDCurrency> {
private int cs; // implicitly final
private USDCurrency(int cs) { this.cs = cs; }
public USDCurrency(int dollars, int cents) {
this(dollars * 100 + (dollars < 0 ? -cents : cents));
}
public int dollars() { return cs/100; }
public int cents() { return Math.abs(cs%100); }
public USDCurrency plus(USDCurrency that) {
return new USDCurrency(cs + that.cs);
}
public int compareTo(USDollars that) { ... }
public String toString() { ... }
}
The instance fields of a value class are implicitly final
. (Special rules apply to the initialization of value class fields in constructors, as described later.) The instance methods of a value class must not be synchronized
.
Many abstract classes have no need for identity-related features and so are also good value class candidates. The class java.lang.Number
, for example, has no fields, nor any code that depends on identity-sensitive features.
abstract value class Number implements Serializable {
public abstract int intValue();
public abstract long longValue();
public byte byteValue() { return (byte) intValue(); }
...
}
The following rules apply to subclassing relationships involving value classes:
- A concrete value class is implicitly
final
and may have no subclasses. - An abstract value class has chosen not to depend on identity, but this choice does not constrain its subclasses: the abstract class may have both value and identity subclasses. (And so a variable of the abstract value class type may or may not refer to a value object.)
- Identity classes may only be extended by other identity classes. Once a class has expressed a dependency on object identity, its subclasses cannot undo this dependency. (Thus, a variable of an identity class type always refers to an identity object.)
- Interfaces may be extended by both value and identity classes, and have no way to express a dependency on object identity.
- The class
Object
, which sits at the top of the class hierarchy, is considered an identity class and has identity instances, but in most respects behaves more like an interface and permits value subclasses.
Beyond the constraints outlined in this section, a value class declaration is just like any other class declaration. The class can declare methods and implement interfaces. Users of the class will not typically notice anything unusual about the class—aside from identity-sensitive behaviors, everything about the objects is the same.
// value objects are created with 'new'
USDCurrency d1 = new USDCurrency(100,25);
// value class types may be 'null'
USDCurrency d2 = null;
// method invocations work as usual
if (d1.dollars() >= 100)
d2 = d1.plus(new USDCurrency(-100,0));
// objects can be viewed as superclass instances
Object o = d2;
String s = o.toString(); // "$0.25"
// objects can be viewed as interface instances
Comparable<USDCurrency> c = d2;
int i = c.compareTo(d1); // -1
References between objects
Value class types are reference types. In Java, any code that operates on an object is really operating on a reference to that object; member accesses must resolve the reference to locate the object (throwing an exception in the case of a null
reference). Value objects are no different in this respect.
It might seem odd to talk about references to objects that have no identity, since it is natural to think of an object's memory address as the run time representation of its identity. Indeed, stable memory addresses are _not_essential for value objects, and JVM implementations will often try to optimize away any indirections to the object data. However, when reasoning about a Java program, it's best to imagine all objects continuing to be handled and operated on via references.
Objects can store references to other objects in their fields, creating complex relationship graphs. There is no restriction on the types of references between value and identity objects. The following value class, for example, stores one reference to an identity object and two references to value objects. The third field, predecessor
, recursively references another object of the same value class type (or stores null
).
value class Item {
private String name; // identity class type
private USDCurrency cost; // value class type
private Item predecessor; // this value class type
public Item(String n, USDCurrency c) {
this(n, c, null);
}
public Item(String n, USDCurrency c, Item p) {
...
}
...
}
There is, however, one important limitation on references between objects: due to value classes' construction requirements (covered later), when a value object's fields are initialized, they cannot refer back to the object itself. So it is impossible, for example, to create an Item
whose predecessor
is that same Item
. More generally, the instance fields of a value object can never be used to create a cycle—at least one object in any cycle would have to be an identity object.
Comparing value objects with ==
The ==
operator traditionally tests whether two references are the same. But this capability depends on object identity: only identity objects can be reliably referenced at a stable location.
With the introduction of value objects, the ==
operator must instead test whether two referenced objects are the same—that is, one is "substitutable" for the other. For identity objects, this is just a different way of describing the same test. But in the case of value objects, this means testing that the objects, wherever located, represent the same value. The result is true
if the objects being compared belong to the same class and have the same field values, and false
otherwise. (Fields with primitive types are compared by their bit patterns. Other field values—both identity and value objects—are recursively compared with ==
.)
// value objects with the same field values are the same
USDCurrency d1 = new USDCurrency(3,95);
USDCurrency d2 = new USDCurrency(3,95).plus(new USDCurrency(0,0));
assert d1 == d2; // true
// objects are still the same when viewed as supertypes
Object o1 = d1;
Object o2 = d2;
assert o1 == o2; // true
// identity objects are unique when created separately
String s1 = "hamburger";
String s2 = new String(s1); // new identity
assert s1 != s2; // true
// == recursively compares identity object fields
assert new Item(s1, d1) != new Item(s2, d1); // true
// == recursively compares value object fields
assert new Item(s1, d1) == new Item(s1, d2); // true
Notice three things about the recursive use of ==
:
- Recursion on identity objects does not perform a "deep" equality test. It compares identities. The referenced identity object may even be mutated—by, say, adding a value to a referenced
List
—but if two value objects are==
, the nested mutation would not impact the==
test. - Recursion on value objects does perform a deep comparison of the nested objects' fields. The resulting number of comparisons is unbounded: if an
Item
has apredecessor
, and thatItem
has apredecessor
, and so on, using==
on theItem
may require a full traversal of the chain of references. (Fortunately, as noted in the previous section, this chain will never be cyclical.) - The ability to compare value objects' fields means that a value object's
private
data is a little more exposed than it might be in an identity object: someone who wants to determine a value object's field values can (with sufficient time and access) guess at those values, create a new object wrapping their guess, and use==
to test whether the guess was correct.
When declaring a value class, it's important to keep each of these factors in mind. In some cases, an identity class may be a better fit.
The equals
method
While ==
tests whether two value objects are the same object, the equals
method tests whether two objects represent the same data. As for identity classes, two value objects may be !=
, but still be considered by the class author to be equal.
// distinct identity objects may be 'equals'
String s1 = "hamburger";
String s2 = new String(s1); // new identity
assert s1 != s2; // true
assert s1.equals(s2); // true
// distinct value objects may be 'equals'
assert new Item(s1, d1) != new Item(s2, d1); // true
assert new Item(s1, d1).equals(new Item(s2, d1)); // should be true
The problem of defining what constitutes "the same data" is left to the class author when they implement their equals
method. For convenience, the defaultObject.equals
implementation aligns with ==
, testing whether two objects are the same; for simple value classes, this is often good enough. Value records are able to provide an even more convenient default implementation, comparing record components recursively with equals
. But these are just starting points, and it's ultimately up to the class author to provide an appropriate equals
implementation.
When thinking about equals
and ==
, its important to remember that a value object's internal state (the data it stores) is not always the same as its external state (the data it represents). An ==
test compares internal state. This is often not what you're after. Instead, the best advice for developers in most cases is to use equals
whenever they need to compare objects.
In the following example, the value class Substring
implements CharSequence
. A Substring
represents a string lazily, without allocating a char[]
in memory. Naturally, then, two Substring
objects should be considered equal
if they represent the same string, regardless of differences in their internal state.
value class Substring implements CharSequence {
private String str;
private int start, end;
public int length() {
return end - start;
}
public char charAt(int i) {
return str.charAt(start + i);
}
public String toString() {
return str.substring(start, end);
}
public boolean equals(Object o) {
return o instanceof Substring && toString().equals(o.toString());
}
}
Substring s1 = new Substring("ionization", 0, 3);
Substring s2 = new Substring("ionization", 7, 10);
assert s1 != s2; // true
assert s1.equals(s2); // true
The distinction between internal state and external state helps to explain why not all value classes are records, and not all records are value classes: records are used to opt out of separate internal state, while value classes are used to opt out of identity. Each of these choices can be made orthogonally.
Other identity-sensitive operations
In addition to ==
, a handful of specialized operations supported by the Java platform have historically relied on object identity. When encountering a value object, these operations behave as follows:
System.identityHashCode
: The "identity hash code" of a value object is computed by combining the hash codes of the value object's fields. The default implementation ofObject.hashCode
continues to return the same value asidentityHashCode
. (Note that, like==
, this hash code exposes information about a value object'sprivate
fields that might otherwise be hidden by an identity object. Developers should be cautious about storing sensitive secrets in value object fields.)- Synchronization: Value objects do not have synchronization monitors. At compile time, the operand of a
synchronized
statement must not have a concrete value class type. At run time, if an attempt is made to synchronize on a value object (for example, where the operand of asynchronized
statement has typeObject
), anIdentityException
is thrown. Invocations of thewait
andnotify
methods ofObject
will similarly fail at run time, because they require callers to first synchronize on the object's monitor. - Garbage collection: Value objects do not have a traditional life cycle—an object may already exist before
new
, and may appear again after it becomes unreachable. So operations that manage the end of an object's lifetime are not relevant to value objects. A garbage collector will never call thefinalize
method of a value object. The classes ofjava.lang.ref
throw anIdentityException
when asked to wrap or operate on a value object.
For developers who need to dynamically require identity in their own code, anIdentityException
may be thrown, and the java.util.Objects
class provides convenience methods hasIdentity
and requireIdentity
.
Safe construction
Constructors initialize newly-created objects, including setting the values of the objects' fields. Because value objects do not have identity, their initialization requires special care.
Larval object leakage
An object being constructed is "larval"—it has been created, but it is not yet fully-formed. Larval objects must be handled carefully, because the expected properties and invariants of the object may not yet hold.
For example, in the following class, name
is expected to hold a validString
, and length
is expected to hold the length of that string. But when the constructor begins, the larval object's name
field is null
; immediately after name
gets set, the larval object's length
is incorrect.
class Name {
final String name;
final int length;
Name(String n) {
name = n;
length = computeLength();
}
int computeLength() {
return name.length();
}
}
Notice that the computeLength
method is asked to run with a larval object as a receiver. The larval object has "leaked" out of the constructor and might be expected to behave like a fully-initialized Name
. Fortunately, the larval object's name
field has already been set, and thecomputeLength
method doesn't depend on the length
field, so an appropriate value is returned. But if, say, the fields were initialized in the opposite order, an exception would occur.
Also notice that the name
and length
fields are marked final
—yet despite this modifier, if a larval Name
is leaked to unsuspecting code, that code may be surprised to observe these final
fields mutating!
In a toy example, these risks may seem minor. But in a complex initialization process involving multiple constructors and class hierarchies that span maintenance domains, larval object leakage can become a singificant risk to correctness and security.
Early & late construction
Traditionally, a constructor begins the initialization process by invoking a superclass constructor, super(...)
. After the superclass returns, the subclass then proceeds to set its declared instance fields and perform other initialization tasks. This pattern exposes a completely uninitialized subclass to any larval object leakage occurring in a superclass constructor.
The Flexible Constructor Bodies preview feature enables an alternative approach to initialization, in which fields can be set and other code executed before the super(...)
invocation. There is a two-phase initialization process: early construction before the super(...)
invocation, and late construction afterwards.
class Name {
final String name;
final int length;
Name(String n) {
// early construction:
name = n;
super();
// late construction:
length = computeLength();
}
int computeLength() {
return name.length();
}
}
During the early construction phase, larval object leakage is impossible: the constructor may set the fields of this
, but may not invoke instance methods or otherwise make use of this
. Fields that are initialized early are set before they can ever be read, even if a superclass leaks the larval object. Final fields, in particular, can never be observed to mutate.
Value object initialization
Early initialization of instance fields is mandatory for value classes.
Value objects lack identity, so there is no canonical memory location in which the late mutation of a field could be observed. JVM implementations need to be free to make copies of value objects—including leaked larval value objects—whenever it is convenient for them to do so. Thus, the values of the object's fields must be provided early, before there is any risk of larval object leakage.
To facilitate early initialization of fields, construction code in value classes prefers early construction wherever possible:
- If a value class constructor has no
super(...)
orthis(...)
call, an implicitsuper()
call is placed at the end of the constructor body, and the entire body is part of early construction (in identity classes, the implicitsuper()
is placed at the start) - Each value class instance field initializer is placed at the start of the constructor, as part of early construction (in identity classes, instance field initializers run after the
super(...)
call) - Instance initializer blocks (a rarely-used feature) continue to run in the late phase, and so may assign to value class instance fields
- For convenience, fields that have been assigned may be read by subsequent early construction code (that is, early construction may freely access the_fields_ of
this
, but may not invoke instance methods or sharethis
with other code)
In practice, value class authors may notice errors when they attempt to usethis
in a value class constructor or field initializer. These errors can be addressed by either (i) refactoring the code so that it no longer depends on this
, or (ii) placing the code that depends on this
after an explicit super(...)
call in a constructor.
In the following example, the Name
class has become a value class, and the author has eliminated the dependency on this
by making the computeLength
method static
and passing the input string as an argument.
value class Name {
String name;
int length;
Name(String n) {
// early construction:
name = n;
length = computeLength(name);
}
static int computeLength(String n) {
return n.length();
}
}
Encouraging early initialization of identity classes
Ultimately, we think developers should shift as much of their construction code as possible to the early phase. This is especially important for value classes, but many identity classes would also benefit.
In the future, we anticipate that identity classes will have a way to adopt the constructor timing of value classes: field initializers run first, in the early phase, and implicit super()
calls run last. (Unlike value classes, identity classes would not be required to initialize all of their fields before an explicit super(...)
call.)
In the mean time, for this JEP javac
provides lint
warnings indicatingthis
dependencies in instance field initializers and implicit-super()
constructors of identity classes. These warnings can be addressed, as for value classes, by (i) refactoring the code so that it no longer depends on this
, or (ii) placing the code that depends on this
after an explicit super(...)
call in a constructor. A class that compiles without warning will likely be able to cleanly transition to the constructor timing of value classes in the future.
(If there are indirect timing dependencies between a subclass and a superclass—say both classes must interact with a mutable static field in a specific order—javac
will not warn about that dependency, but as a best practice, the class author should place code that must run late after an explicit super(...)
call.)
As a special case, in an identity record class, a constructor dependency onthis
is likely a bug, and this JEP specifies either an error or a mandatory warning (TBD) to address the issue. The constraints on record constructors are relaxed so that a constructor can use an explicit super()
call to indicate code that must run in the late phase.
The enhancement that allows fields to be read during early construction applies to both value classes and identity classes.
Run-time optimizations for value objects
Because there is no need to preserve identity, Java Virtual Machine implementations have a lot of freedom to encode value objects at run time in ways that optimize memory footprint, locality, and garbage collection efficiency. Optimization techniques will typically duplicate, re-encode, or re-use value objects to achieve these goals. Re-encoding might be useful, for example, to copy a value object into a variable that requires fewer memory loads to access the object's data.
This section describes abstractly some of the JVM optimization techniques implemented by HotSpot. It is not comprehensive or prescriptive, but offers a taste of how value objects enable improved performance.
Value object scalarization
Scalarization is one important optimization enabled by the lack of identity. A scalarized reference to a value object is reduced to its "essence", a set of the object's field values without any enclosing container. A scalarized object is essentially "free" at run time, having no impact on the normal object allocation and garbage collection processes.
In HotSpot, scalarization is a JIT compilation technique, affecting the representation of references to value objects in the bodies and signatures of JIT-compiled methods.
The following illustrates how the JIT compiler might translate the Color.mix
method to scalarize its input and output. The "essence" of a Color
reference is 3 bytes, r
, g
, and b
, along with a boolean to indicate whether the reference is null
—in which case the other 3 bytes can be ignored. (In this pseudocode, the notation { ... }
refers to a vector of multiple values that can be returned from a scalarized method. Importantly, this is purely notational: there is no wrapper at run time.)
// original method:
public Color mix(Color that) {
return new Color(avg(red, that.red),
avg(green, that.green),
avg(blue, that.blue));
}
// effectively:
static { boolean, byte, byte, byte }
$mix(boolean this_null, byte this_r,
byte this_g, byte this_b,
boolean that_null, byte that_r,
byte that_g, byte that_b) {
$nullCheck(this_null);
$nullCheck(that_null);
return { false,
avg(this_r, that_r),
avg(this_g, that_g),
avg(this_b, that_b) };
}
// original invocation:
new Color(237, 139, 0).mix(new Color(0, 0, 0));
// effectively:
$mix(false, 237, 139, 0, false, 0, 0, 0);
JVMs have used similar techniques to scalarize identity objects in local code when the JVM is able to prove that an object's identity is never used. But scalarization of value objects is more predictable and far-reaching, even across non-inlinable method invocation boundaries.
One limitation of scalarization is that it is not typically applied to a variable with a type that is a supertype of a value class type. Notably, this includes method parameters of generic code whose erased type is Object
. Instead, when an assignment to a supertype occurs, a scalarized value object must be converted to an ordinary heap object encoding. But this allocation occurs only when necessary, and as late as possible.
Value object heap flattening
Heap flattening is another important optimization enabled by value objects' lack of identity. The "essence" of a reference to a value object is encoded as a compact bit vector, without any pointer to a different memory location. This bit vector can then be stored directly in heap storage, in a field or an array of a value class type.
Heap flattening is useful because a flattened value object requires less memory than an ordinary object on the heap, and because the data is stored locally, avoiding expensive cache misses. These benefits can significantly improve some programs' memory footprint and execution time.
To illustrate, an array of Color
references could directly store 32-bit encodings of the referenced objects. Note that, as for scalarization, an extra flag is needed to keep track of null
references.
// original code:
Color[] cs = new Color[100];
cs[5] = new Color(237, 139, 0);
Color c1 = cs[5];
Color c2 = cs[6];
// effectively:
int[] cs = new int[100];
cs[5] = $flatten(false, 237, 139, 0);
{ boolean c1_null, byte c1_r, byte c1_g, byte c1_b } =
$inflate(cs[5]);
{ boolean c2_null, byte c2_r, byte c2_g, byte c2_b } =
$inflate(cs[6]);
// where:
int $flatten(boolean val_null, byte val_r,
byte val_g, byte val_b) {
if (val_null) return 0;
else return (1 << 24) | (val_r & 0xff << 16) |
(val_g & 0xff << 8) | (val_b & 0xff);
}
{ boolean, byte, byte, byte } $inflate(int vector) {
if (vector == 0) return { true, 0, 0, 0 };
else return { false,
vector >> 16 & 0xff,
vector >> 8 & 0xff,
vector & 0xff };
}
The details of heap flattening will vary, of course, at the discretion of the JVM implementation.
Heap flattening must maintain the integrity of objects. For example, the flattened data must be small enough to read and write atomically, or else it may become corrupted. On common platforms, "small enough" may mean as few as 64 bits, including the null flag. So while many small value classes can be flattened, classes that declare, say, 2 int
fields or a double
field, might have to be encoded as ordinary heap objects.
In the future, 128-bit flattened encodings may be possible on platforms that support atomic reads and writes of that size. And theNull-Restricted Value Types JEP will enable heap flattening for even larger value classes in use cases that are willing to opt out of atomicity guarantees.
Migration of existing classes
Existing classes that represent simple domain values and that have followed best practices to avoid identity dependencies can be easily migrated to be value classes, with minimal compatibility impact. When preview features are enabled, a handful of commonly-used classes in the JDK, outlined below, are migrated to be value classes.
Preparing for migration
Developers are encouraged to identify and eventually migrate value class candidates in their own code. Records and other classes that represent "simple domain values" are potential candidates, along with interface-like abstract classes.
The author of an identity class that is intended to become a value class in a future release should consider the following:
- On migration, all instance fields of the class will implicitly be made
final
and will need to be initialized without any reference tothis
. If that presents difficulties, the class may not be be a good migration candidate. If there are any non-private
, non-final
fields, the change will need to be coordinated with any users who might attempt to mutate the fields. - Similarly, a concrete, non-
final
class will becomefinal
on migration. If users have been allowed to both extend and create instances of the class, the author must choose to either break subclasses (by addingfinal
), break instance creations (by addingabstract
along with, say, factory methods and a private implementation class), or conclude that the class is not a good migration candidate. - The
equals
andhashCode
methods should be overridden by the class so that their results are consistent before and after migration. - Users of the class will be able to observe different
==
behavior after migration. If this is a concern, an ideal migration candidate might declare private constructors and provide a factory method that explicitly advertises the possibility of results that are==
to a previous result. (See, for example, theInteger.valueOf
factory method.) - As described in previous sections, the
==
andidentityHashCode
operations may allow users to guess or infer the values ofprivate
fields, and may be noticeably slow for value objects that (probably recursively) encode very large structures. If these are concerns for the class, it may not be a good migration candidate. - Attempts to synchronize on instances or use the
java.lang.ref
API will fail after migration. Of course, the class itself should not declaresynchronized
methods or otherwise use these features. There's not much that can be done to prevent users from doing so, but it may be helpful to advertise the risk in the class's documentation. - If the superclass is not
Object
, it must be made a value class before this class can be migrated. All of the considerations in this section apply to the superclass.
Impact of migration
In most respects, an identity class that has addressed the risks outlined in the previous section can be compatibly made a value class by simply adding thevalue
modifier.
All existing binaries will continue to link successfully. The only new compiler errors will be attempts to synchronize on the value class type.
There are some behavioral changes that users of the migrated classes may notice:
- The
==
operator may treat two instances as the same, where previously they were considered different - Attempts to synchronize on an instance or use the
java.lang.ref
API will fail with an exception - Assumptions about unique ownership of an instance may be violated (for example, an identical instance may be created at two different program points)
- Performance will generally improve, but may have different characteristics that are surprising
Value classes in the standard library
Some classes in the standard library have been designatedvalue-based, with the understanding that they would become value classes in a future release.
Under this JEP, when preview features are enabled, the following standard library classes are considered to be value classes, despite not having been declared or compiled with the value
modifier:
java.lang.Number
and the 8 primitive wrapper classes used for boxingjava.lang.Record
java.util.Optional
,java.util.OptionalInt
, etc.- Most of the public classes of
java.time
, includingjava.time.LocalDate
andjava.time.ZonedDateTime
The migration of the primitive wrapper classes should significantly reduce boxing-related overhead.
Alternatives
As discussed, JVMs have long performed escape analysis to identify objects that never rely on identity throughout their lifespan and can be inlined. These optimizations are somewhat unpredictable, and do not help with objects that escape the scope of the optimization, including storage in fields and arrays.
Hand-coded optimizations via primitive values are possible to improve performance, but as noted in the "Motivation" section, these techniques require giving up valuable abstractions.
The C language and its relatives support flattened storage for struct
s and similar class-like abstractions. For example, the C# language hasvalue types. Unlike value objects, instances of these abstractions have identity, meaning they support operations such as field mutation. As a result, the semantics of copying on assignment, invocation, etc., must be carefully specified, leading to a more complex user model and less flexibility for runtime implementations. We prefer an approach that leaves these low-level details to the discretion of JVM implementations.
Risks and Assumptions
The feature makes significant changes to the Java object model. Developers may be surprised by, or encounter bugs due to, changes in the behavior of operations such as ==
and synchronized
. We expect such disruptions to be rare and tractable.
Some changes could potentially affect the performance of identity objects. Theif_acmpeq
test, for example, typically only costs one instruction cycle, but will now need an additional check to detect value objects. But the identity class case can be optimized as a fast path, and we believe we have minimized any performance regressions.
There is a security risk that ==
and hashCode
can indirectly exposeprivate
field values. Further, two large trees of value objects can take unbounded time to compute ==
, potentially a DoS attack risk. Developers need to understand these risks.
Dependencies
Prerequisites:
- In anticipation of this feature we already added warnings about potential behavioral incompatibilities for value class candidates in
javac
and HotSpot, via Warnings for Value-Based Classes andWarnings for Identity-Sensitive Libraries - Flexible Constructor Bodies (Third Preview) allows constructors to execute statements before a
super(...)
call and allows assignments to instance fields in this context. These changes facilitate the construction protocol required by value classes. - Strict Field Initialization in the JVM (Preview) provides the JVM mechanism necessary to require, through verification, that value class instance fields are initialized during early construction
Future work:
- Null-Restricted Value Class Types (Preview) will build on this JEP, allowing programmers to manage the storage of nulls and enable more dense heap flattening in fields and arrays.
- Enhanced Primitive Boxing (Preview) will enhance the language's use of primitive types, taking advantage of the lighter-weight characteristics of boxing to value objects.
- JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by value class types.