Primitive streams and optional (original) (raw)

Doug Lea dl at cs.oswego.edu
Sat Nov 24 09:13:14 PST 2012


Just in case anyone is interested in re-deciding some basics In light of the continuing saga of unappealing API choices, here's one last push for adopting the j.u.c null policies in streams.

Sorry that I can't think of a good way to present this without stepping back into prehistory!

Long ago (1950s), people noticed that there are two basic flavors of data: values and pointers. A value is just, um, a value. A pointer differs conceptually in that it might not point to anything. Hence the invention of null, as a special state of a pointer, that for economy, is encoded as the special value zero if null, else a (possibly virtualized etc) memory address. (One disadvantage of this encoding is that it loses type information -- an early form of "erasure". A null pointer to an int looks the same as a null pointer to a double, etc.)

Only slightly less long ago (late 1960s), people noticed that pointer-like notions could be elevated to the idea of "references to objects" (in early forms, an object's pointer address was its identity). But still with the notion that a reference might not point anywhere.

So, now we have four different concepts:

  1. values
  2. possibly null pointers to values
  3. objects
  4. possibly null references to objects

The possibly-null case naturally occurs with partial functions and methods, often related to lookup/search: get the thing at some uninitialized array position, or in a hash map without a binding, etc. Also for terminals in linked data structures. You need some way to say that there is no such thing there.

The FP (and ADT) folks had an arguably easier time of this, since they only encountered cases (1) and (2). Still they had the notion of a compound-value, which is like an object, but has no defined identity. Any partial function that "should" return an X but need not can instead return an Optional. And the most common technique for implementing this notion is to "box" the value when present, else return null. The programmer is never never exposed to this though. For example, using "==" on a boxed vs unboxed int does the same thing (comparing values, not the "invisible" pointers).

The pure OO (smalltalk etc) folks also in principle had an easier time, since they conceptually dealt only with cases (3) and (4). Since everything is an object, everything worked uniformly. (Although many people now think in retrospect that "nullable" should have a required part of any method return type spec so that programmers know when nulls might legitimately vs accidentally appear. JSR308 might help with this though.) However people don't appreciate it when "==" always compares pointers (among other issues) for integers, so special rules were made for these cases, that are basically the inverse of the FP approach. That is, in FP, pointerness is hidden, in OO, pointerless-valueness is hidden. But less hidden. for example Integers are objects with identity, monitors, etc, (and so are unlike "Optional" if such a thing existed) and you can readily tell if you have an Integer vs an int. On the other hand, you can still use ints as (autoboxed) objects inside collections etc without needing to have a special implementation just for ints (at the price of now-famous space bloats).

Any language/library that embraces both of these notions together has to do something that is not identical to either pure FP or OO approaches. Some languages get a foothold by distinguishing object types from value types. Thus, nullness applies to objects, optionalness applies to values. So, Scala, Lime, etc have variants of:

  1. value types: int, double etc
  2. Optional: the result of partial functions on value types
  3. object types (Object and subclasses)
  4. refs: possibly null references to objects

We don't have this foothold. Arguably, because of this, we should not be creating such frameworks. Be we are.

So the choices are:

A. Pretend we have value types. Introduce Optional for use with any value-like things, along with some set of conventions about how they interact with objects and possibly null refs.

B. Don't pretend we have value types unless/until we have them. Use the standard OO conventions, in which boxing classes like Integers are used when you need to elevate a value to objecthood. And when you have one, you have a full-fledged object, not just an invisible pointer. And when you don't have one, you just have null.

Choice A is tempting because of its familiarity by programmers with FP background. But doing so forces a never-ending set of bandaids (as we've seen lately) because none of the rules for interoperating with Object conventions make much sense.

Sticking with (B) is less tempting to some people not only because they like to think of some of their classes in value-like ways, but also because streams (like java.util.concurrent) would need to relentlessly maintain the "null means nothing there" policy. So, emptyStream.reduce(f) must return null, null elements appearing in streams must be skipped, etc. But not only is this the most defensible policy to use in the absence of true value types, it is best suited to kludgelessly evolve to embrace value types if they are ever supported.

There is also a choice C: always throw exceptions for partial functions / nothing-there cases. The logic of this is fine, and completely reasonable is when nothing-there-ness is accidental or exceptional. But the world voted against the painfulness and inefficiency of everyday programming under this encoding of nothing-there decades ago.

Summary: get rid of Optional. Use null consistently to mean nothing there (plus exceptions in exceptional cases). Use the standard boxed types for numerics. Until/unless there is are value types, create intStream etc as a separate set of classes with merely analogous APIs. (And while we are at it, add LongKeyHashMap and a few others!) Don't worry about people who used null as "meaningful" elements, map keys or map values. No one is forcing them to use streams.

-Doug



More information about the lambda-libs-spec-observers mailing list