Streams design strawman (original) (raw)

Howard Lovatt howard.lovatt at gmail.com
Thu Apr 26 01:09:22 PDT 2012

Previous message: Streams design strawman
Next message: hg: lambda/lambda/langtools: Fix regression in lambda-compatibility check.
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

A couple of years ago I wrote my own parallel functional collections library and in many ways it is similar to the strawman, since the libraries are similar maybe some of my practical experience will be of interest. Below I have made some brief comments into three categories similarities/differences between strawman and mine, user programmable short cut functions, simplifications, bi-streams, and more radical proposals. I have written below as a series of short notes, happy to expand the notes if people are interested.

Similarities/Differences Between Strawman and My Libraries
1. Similarities
  1. Compatible with existing Collections
  2. Can copy in from an existing collection
  3. Can copy out into a collection
  4. Mixture of lazy and eager operations, split is similar. I use List's methods, reduce, copy, and clone as eager and rest as lazy
2. Differences
  1. I use a tuple library, your choice of a map probably better (see 4.1 below)
  2. My terminology is different, I have used your terminology in this email
  3. I don't support nulls, I find this really useful since I don't have to check for null in Mappers etc.
  4. My ParallelList extends AbstractSequentialList and implements Cloneable, List operations are eager and immutable. This is useful since in general you don't need to copy out into a List for an old API, since you already have a List. Concurrent modifications via an second eager evaluation fail fast.
  5. I copy into ParallelList using an Iterator, Iterable, Collection, array, or value and skip nulls, I also use use the idioms of a new ParallelList(seed).flatMap(mapper) and ParallelList.range(from,to).map(mapper) to populate the ParallelList (also see 2.7 below). I find it helpful to copy in since my internal representation is nothing like any of the existing collections and it also means I don't have to watch for mutations.
  6. My Mapper etc, get passed the index as well as the value (I find this really useful!). One of the reasons why I don't find sequential versions worth while (see 2.8 below).
  7. I have a static range method that creates a new ParallelList
  8. I only have parallel versions, I find the overhead of parallel small and since I pass the index (2.6) I don't have much use for sequential versions
  9. I have static factories called in that mimic constructors but return a pre-made empty list rather than a new empty list as a constructor would,
3. User Programmable Short Cut Functions
4. You can break my parallel loop by throwing a pre-made BreakException
5. You can continue my parallel loop by throwing a pre-made ContinueException or returning null
6. Break and continue are useful for translating existing loops into a ParallelList
7. Break and continue are useful for short cut operators, e.g. Or and And
Simplifications (reduction in API 'surface area")
1. The 'surface area' of my API is small since the only public class is ParallelList and it has constructors and factories for Iterable and Iterator, therefore you could reduce your API to defender methods called parallel in Iterable and Iterator and classes ParallelList and ParallelMap (aslo see point 5.1 below)
Bi-Streams
1. I use a tuple library, but the Bi-Stream idea is interesting and well worth pursuing since I have noticed some overhead with tuples
More Radical Proposals
1. Instead of basing everything on ParallelList you could base everything on ParallelMap<K,V>:

class ParallelMap<K,V> extends AbstractMap<K,V> implements Cloneable, Serializable {

// lazy parallel operations

// eager reduce and clone operations

// Map operations are all eager, only immutable operations supported

}

class ParallelIndexed extends ParallelMap<Long,V> implements NavigableMap<Long,V> {

V get(long index);

// other get like methods with long index, but not put like methods (immutable)

// NavigableMap and hence SortedMap operations are all eager, only immutable operations supported // indices must be continuous from firstKey to lastKey

}

class ParallelGroup<K,Vs> extends ParallelMap<K,ParallelIndexed> { ... }

This reduces the API 'surface area' further, since you now only have maps that are sub-classes of ParallelMap. This technique fits well with groupBy since it would return a ParallelGroup<U,V>.

Hope you find these comments useful.

-- Howard.

On 19 April 2012 12:49, Brian Goetz <brian.goetz at oracle.com> wrote:

I've posted a strawman writeup of the key concepts we're thinking about for the libraries at:

http://cr.openjdk.java.net/~briangoetz/lambda/collections-overview.html The design is still rough, and we've started checking in an even rougher implementation into the lambda repository. There are lots of omissions, errors, and open questions. But it is at a point where we've got a consistent enough strawman for people to start playing with it, and share their experience. As the document suggests, there are a large number of open issues, and we've temporarily ignored these for the purposes of pulling together something that captures the basic functionality. We've deliberately deferred addressing many sticky issues including attachment to collections, primitive specialization, checked exceptions, infinite streams, IO-backed streams, nulls, and of course, naming. These are all sticky issues, and we'll get to them, but for now, I'd like to leave them be for a little while longer, so that we can shake out the key functionality first. What I am asking of the community is: TRY OUT THE CODE. *The single most valuable thing that the community can do to help move this effort forward, and improve the quality of the result, is to try out the code early and often.* (A single practical experience report is worth many armchair observations and suggestions!) What's in the lambda repo right now is still a little rough, but within the next few days it should be cleaned up enough for people to be able to try it out on their code. We would love it if people could try to lambdify some small bit of their code base, and report back on how it worked -- whether it worked great, or not. Also valuable is where you got stuck, such as "I had trouble converting this pattern to use the new libraries."

-- -- Howard.

Previous message: Streams design strawman
Next message: hg: lambda/lambda/langtools: Fix regression in lambda-compatibility check.
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the lambda-dev mailing list