lazy finals (original) (raw)

Lazy Finals for Java

John Rose

April 2014

Summary

Introduction

In present-day Java, a final variable is a variable declared with the final keyword. A final variable can be read by any code that can access it, but cannot be written, except under very narrow circumstances. In effect, it is a constant from the moment it is initialized. Finals often have initializer expressions, and if the initializer expression is a compile-time constant, the final is also a compile-time constant.

Otherwise, every non-constant final variable is created with a type-specific default value (such as null or zero), and later on initialized. The initialized value might be the same as the default value, but it is usually some different, more “interesting”, value. The uninitialized default value (null or zero) can sometimes be observed by code which has access to the containing object (or class, if the variable is static). But normally, a final variable is initialized shortly it is created.

The special case of a blank final variable is one which has no initializer expression. The Java language has special definite assignment rules for blank finals, requiring them to be initialized exactly once. The Java virtual machine (“JVM”) does not distinguish between blank and non-blank non-constant final variables, since at the bytecode level a constructor simply executes code to store values into all final fields, whether blank or not.

Final variables are subject to some restrictions: Any final variable (including blank finals) must be initialized before the constructor of the containing object returns normally. (For statics, the class initializer serves as the constructor of the containing object.) In addition, the language does not permit a blank final to be initialized twice.

In exchange for these restrictions, JVMs are often able to make improvements in code which works with final variables.

Users derive certain benefits from final variables. The restrictions on writes to final variables sometimes make such variables more useful as public API elements, although this is debatable. In any case, the extra checking performed on finals is a useful hedge against running code containing some simple programming errors. Since final variables are not subject to race conditions, they may always be “captured” by inner classes and lambdas nested in their scopes.

The most important benefit to users from final variables is the provision of safe publication for non-varying data structures. In simple terms, if a data structure is accessed only via a final variable, and the data structure is initialized before it is published through that final variable, then the contents of that data structure are equally observable to all threads. Specifically, pre-initialization states of the data structure (such as default null or zero values in freshly created fields) cannot be observed.

Proposal

We will extend the notion of a blank final to include lazy final variables. A lazy final variable (or “lazy final” for short) is a kind of blank final variable whose initialization can be delayed indefinitely, even after its associated constructor returns.

In order to emphasize this special delay, we will use a special term “completion” for the initialization of a lazy final. The delayed initialization of a lazy final is completed when it is assigned its permanent value. Before that point, the variable may thought of as incomplete.

Note: By analogy, an object containing lazy finals may also be thought of as completed when all its finals are initialized. I have discussed object completion before, using the metaphor of larval objects.

Until completion, a lazy final can be read, but this read does not report the eventual permanent value, but only the default initial value (such as null or zero) of the variable’s type.

Completion is expressed as simple assignment (JVM “putfield”), as for blank final initialization inside constructors. A lazy final may only by the final’s own class.

Completion is enforced, by the JVM, to occur at most once per lazy final variable. The enforcement is global and not subject to races. An exception is used to report a redundant attempt to complete a lazy final.

Completed states are distinguished to all parties by the presence of non-default values. Lazy finals (unlike other kinds of final variables) may never be set to their default initial values.

Safe publication is extended to values reached through lazy finals.

This extension is backward compatible because lazy finals will be distinguished by a new variation of the syntax for blank finals, probably a modifier or pseudo-modifier used in conjunction with thefinal modifier, visible in both the Java language and VM.

Example: a set-once cache

Before we give further details, let’s try out a code example, of a class which implements lazy evaluation using the JDK 8 Supplier API.

class CachingSupplier<T> implements Supplier<T> {
  private final Supplier<T> supplier;
  private __LazyFinal final T value;
  public CachingSupplier(Supplier<T> supplier) {
    this.supplier = supplier;
    //this.value is left null, for now
  }
  @Override public T get() throws NullPointerException {
    if (value == null)
      trySetValue(supplier.get());
    return value;
  }
  private void trySetValue(T possibleValue) {
    //the JVM must check this even if we do not:
    //possibleValue = Objects.requireNonNull(possibleValue);
    try {
      value = possibleValue;  // implicit CAS
    } catch (FinalStoreException ignore) {
      // another racer won
    }
  }
  // example of public access to incomplete state:
  public Optional<T> getOptional() {
    // read may return either null or the unique ultimate non-null value
    return Optional.of(value);
  }
}

Except for the placeholder keyword __LazyFinal and the assignmentvalue = possibleValue, this is valid Java code. Except for race conditions, it operates as advertised if the value field is made non-final and the catch is adjusted (say, to NullPointerException).

Syntax Note: The keyword __LazyFinal may be spelled volatile without much violence to pre-existing specifications.

The Details

Here is the rest of the proposal.

Lazy finals can be of any type, including references and primitives.

A lazy final may not be declared with an initializer expression. (But see below for a possible meaning of such a thing.)

Every assignment to a lazy final reference logically includes two implicit null checks, on the value being stored, and the value being overwritten.

If the reference value being stored is null, the assignment must complete abnormally with a NullPointerException, because if the update were allowed to complete normally, the updated variable’s state would then be indistinguishable from its non-updated state.

Otherwise, if the reference value being overwritten is not null, the variable has already been updated once, and a second update must not be accepted. In this case the assignment must complete abnormally with a FinalStoreException. Note that this will happen even if the second value is identical with the first.

FinalStoreException is defined to be a subclass of Exception.

User Note: Since FinalStoreException is checked, programmers are forced to deal with update conflicts explicitly. The programmer can examine both the losing and winning values, and then decide whether to ignore the exception (accept the “winner”), throw a different exception, or somehow modify the “winner” to take account of the “loser”.

In the same way, every assignment to a lazy final non-reference variable includes two implicit equality checks, against the default initial value of the variable. This zero value is either a numeric zero, the null character, or the boolean false.

If the non-reference value being stored is equal to the default initial value, an IllegalArgumentException must be thrown. Otherwise, if the non-reference value being overwritten is notequal to the default initial value, a FinalStoreException must be thrown.

With floating point values, non-numbers are never deemed to be equal to a default initial value. That is, despite the oddities of comparison with NaNs, an incomplete lazy final of floating point type may be completed (but once only) with a NaN.

Lazy finals can be either static or non-static. Both kinds of lazy finals can be used for safe publication, meaning that an observer who reaches an object only by a final variable observes that object’s contents in their current state, if that current state was assigned before the object was “published” via the final variable.

Lazy finals can have any access permission (public, private, package, or protected). But outside of the defining class or code of equivalent private access rights, they may only be read.

User Note: Designers of APIs must take into account that a public lazy final may be observable in its null state, depending on the life cycle of the object.

Although the Java language trackes definite assignment and definite un-assignment of other final variables, and restricts their reads and writes to follow certain patterns, lazy finals are not so restricted. Where those restrictions are specified, a distinction will be made between lazy finals and other finals.

User Note: For example, a method called from a constructor can complete a lazy final which the constructor itself might then read, contrary to the requirements of definite assignment.

Note: If a source code program defines a lazy final but does not assign it a value anywhere, it deserves a compile-time warning. This is the moral equivalent of the error at the end of a constructor in which a blank final has not been assigned a value. As with blank finals, the JVM should not make such a check.

Since compound assignment is not allowed for other kinds of finals, we disallow it for lazy finals also:

value += 1;  // ERROR; does not compile
value = 1;   // use this instead

As a consequence of this design, the following statement will always be legal but never complete normally if the variable is a lazy final:

value = value;  // compiles but throws one of two exceptions

Lazy finals do not incorporate the semantics of volatile non-final variables. Reading or writing a lazy final does not create volatile synchronization events in the Java Memory Model, except perhaps as required by the semantics described above. If the volatile keyword used as in the syntax of lazy finals, a distinction between lazy finals and non-final volatiles will be made in the relevant specifications.

In classfiles loaded by the JVM, the modifier bit 0x0040 (ACC_VOLATILE) is used to signal the presence of a lazy final. Although this bit is previously illegal in combination with the modifier bit 0x0010 (ACC_FINAL), the rules will be relaxed in a new classfile version to allow both bits to be set at once.

Indirect access via reflection and method handles follows the same rules as normal access. The setAccessible permission does not allow changes to completed lazy finals. (Perhaps this restriction could be extended to non-lazy finals.)

Although it is an error (raising a checked exception) to store a value into a lazy final twice, there are no rules (in either the Java language or VM) which attempt before execution to reject programs which appear to perform redundant stores. Programmers who store into lazy values are required by the language to take account of the checked exceptions which arise from these operations.

Lazy finals allow something previously impossible in the language: Persistent (immutable, all-final) data structures can include reference cycles.

Example: A Builder Pattern

Lazy finals can help create a “builder” design pattern.

The simplest possible lazy final variable is a boolean. There are only two valid operations on such a boolean, reading its value, and setting it to true. This provides a simple basis for an API which creates an object in an incomplete state, fills it in via some method calls, and then completes it.

Methods which may only be used before an object is completed, or afterwards, can test the flag to determine valid usage.

The lifecycle of an object an object containing such a flag runs like this:

  1. Create the object with some initial parameters. The constructor controls this step, and creates an incomplete object.
  2. Run some configuration methods on the incomplete object. These may set normal variables or other lazy finals. Configuration methods may test the boolean to detect invalid usage, such as an attempt to reconfigure a completed object.
  3. When the object is ready to complete, run a completion method. This method can run error checks to ensure that the object is fully and correctly configured. It then sets the lazy final boolean.
  4. In its complete state, the object may be published safely, at least if all of its variables are final.
  5. Clients may run query methods on the completed object. Query methods may test the boolean to detect invalid usage, such as an attempt to query an incomplete object.

A separate boolean is not strictly needed if the object is configured only through the setting of lazy finals.

This pattern allows an object to be its own builder, which is normally impossible. Normally, a separate builder accumulates configuration information and, upon completion, creates a fresh object populated with blank final fields. The lazy final pattern can also be used in this style, with separate builders. An advantage of lazy finals is that the builder does not need to keep a separate copy (as mutable state) of the configuration information, but can configure directly into the hidden (“larval”) object, before publication. In this usage, the builder can become a stateless “early visitor” which stores values into the final product before it is completed and published.

Appendix: Possible Upgrades

Although array elements and fields are both Java variables, only fields can be final. The syntax and semantics for managing final fields is not available for managing final array elements, since arrays do not have constructors. (Arrays do support both initialized and “blank” element variables, so both blank and non-blank final array elements are possible in principle.) If arrays were somehow given the ability to contain final fields, then the same conventions may be extended to support lazy finals also.

It might be reasonable to assign a meaning to a lazy final with an initializer. All reads of the lazy final would have to be desugared into an access method which would execute the initializer expression.

private __LazyFinal final int hashCodeCache = computeHashCode();
public int hashCode() { return hashCodeCache; }

…might desugar to:

private __LazyFinal final int hashCodeCache;
private int lazy$hashCodeCache() {
  int x = hashCodeCache;
  if (x == 0) {
    try {
      hashCodeCache = x = computeHashCode();
    } catch (FinalStoreException race) {
      if (x != hashCodeCache)  // race + conflict
        throw new IllegalArgumentException();
    }
  }
  return hashCodeCache;
}
public int hashCode() { return lazy$hashCodeCache(); }

The expression would be executed racily, and only one racer would win, with losers discarded silently, or perhaps compared to the winner, as in the example above. Races and conflicts could be controlled by running the initializer code inside suitable locking, specified by the user (via some other syntax).

One problem with this proposal, as well as with some other uses of lazy finals, is the need to “hook” reads from external clients, to prevent the escape of incomplete (default zero) values. This can be controlled if the lazy final is private, but otherwise separate compilation allows other classes to issue getfield requests against the lazy final. The JVM has no mechanism (at present) for “hooking” simple field reads. This issue may require that, at least at first, initialized lazy finals would need to be marked private.

Appendix: Implementation Notes

It is likely that JVMs can support lazy finals without much new infrastructure. A read of a lazy final may be implementable as a simple read, and a write of a lazy final by a compare-and-swap or exclusive load and store. Some platforms may require additional fencing. In particular, platforms which require a safe publication fence in constructors which set normal finals may also require a similar fence near (probably before) a set of a lazy final.

It is likely that JVMs can make good use of lazy finals to optimize field values at JIT compile time. A compiler can observe a lazy final field and, if it differs from the default value of the field’s type, that value can be subsistuted as a compile-time constant for the field, with no synchronization. (This is why it is important that lazy finals do not incorporate unnecessary volatile semantics.)

Appendix: Observation of Default Values

It is sometimes surprising to users that default values of final variables can be observed at all. The Java language discourages access to uninitialized finals by enforcing definite assignment conditions on final fields in the constructor. But it does not absolutely prevent such access.

Here is an example of the default value of a blank final in one class being easily read by another class:

class Writer {
  final int amount;  // blank final
  Writer(int amount) {
      Reader.readAmount(this);  // amount = 0
      this.amount = amount;
      Reader.readAmount(this);  // amount = ...
  }
}
class Reader {
  static void readAmount(Writer w) {
    System.out.println("amount = "+w.amount);
  }
  public static void main(String... av) {
    readAmount(new Writer(42));
    /* output:
    amount = 0
    amount = 42
    amount = 42
    */
  }
}

In other more complex cases, a superclass constructor might call a subclass method (via virtual override), and that subclass method might observe an unexpected default value. As in the example above, a later call to the subclass method, after the constructor finishes, will observe the expected non-default value.

The above example can be reworked in terms of static variables also. Here is what the writer would look like in that case:

class Writer {
  static final int amount;  // blank final
  static {
      Reader.readAmount();  // amount = 0
      amount = 42;
      Reader.readAmount();  // amount = 42
  }
}

In the case of static variables, default values are sometimes accidentally observed because of cycles in class initialization dependencies.

Appendix: Roads Not Taken

It would be possible to allow compound assignment on a lazy final, with the caveat that it could only execute once. There seems to be no gain from this, since the assignment can only execute once, and on a default value:

value += 1;  // either sets value to 1 or throws an exception

Given that lazy final completion can fail two different ways, it may seem odd to introduce a new checked exception for one way (the final is already complete) and recycle an unchecked exception for the other way (the proposed completion value is a default value). Why not unify the two exceptions? The answer is that the two failure modes are very different: One is a race condition and one is a simple domain error. The two different conditions have different root causes and require different handling, in all use cases so far. Simple domain errors are handled in Java with unchecked exceptions, so we continue this. But the race conditions seem to need, in many use cases, some extra logic to investigate the race, hence a checked exception is more helpful.

It might be reasonable to introduce a second, unchecked exceptionZeroValueException or IllegalCompletionException, to use instead of NullPointerException and IllegalArgumentException.

We could apply some of the new rules to existing final fields. Currently, the JVM does not enforce the language-level restriction that each blank final must be initialized exactly once. It simply restricts blank final writes to occur only in their respective object constructors (or class initializers). For compatibility reasons, we do not propose to change these rules, except for lazy finals. The JVM enforces unique initialization of lazy finals by throwingFinalStoreExceptions as needed.

It might be reasonable to allow any code which has access to a lazy final variable to complete it. Thus, if a lazy final were public, any client of its containing object would be able not only to read it, but also to write a value into it, if it were incomplete. We believe it is desirable on balance to preserve the existing documented restriction that a final variable (of any kind) can only be set from within its own defining class. This will preserve the useful property that final variables can (sometimes) be useful public members of classes, without giving clients the ability to “scribble” on them.

We could allow virtual machines a little more room for optimization if redundant assignments of the same value were allowed. But the extra latitude does not appear to buy anything on any real hardware, and it would create a useless irregularity in the single assignment rule for finals. There would also be tricky complications to the safe publication rule for finals, since might be relevant updates between the first and second assignments.

To avoid the special rules about nulls and zeros, we could require the JVM to allocate an additional boolean flag asssociated with each lazy final variable, and to update both the flag and the value in tandem. The standard would require additional syntaxes to query and update the flag and variable in tandem. This does not seem desirable, since it would require extra complexity, without much gain except the debatable merits of allowing nulls in reference containers, and in distinguishing between zeroes which result from completion from and which do not. There is an easy workaround: Simply use a lazy final reference to a boxed value; the boxed value can then be a zero. And with value types, an explicit boolean could be added in software.

As proposed here, lazy finals can only be assigned once, so the state diagram for a lazy final has only two states, initial and complete, and one transition. More complex acyclic state graphs are sometimes useful, to represent data which increases monotonically according to some rule. For example, a JVM method may go from loaded to linked, and then later to compiled. A state diagram may even have cycles, in which case the data structure is not wholly monotonic. For example, a method might be deoptimized back to its linked state, perhaps with some high-cost operation like a safepoint. Lazy finals are a useful component for monotonic data structures with only a few transitions. A lazy final can be bound to an object containing further lazy finals that can represent successor states. But they do not appear to readily supply a way to register the ultimate state of a multi-step transition, nor do they “go backwards”. It is possible to imagine enhanced versions of finals with additional operations to adjust a completed final to some other state, in some manner compatible with the abstraction it represents. Of course, plain variables do all of that quite well, and so they also deserve compiler optimization work.