Anagrams Kata (Was: ConcurrentHashMap/ConcurrentMap/Map.compute) (original) (raw)
Brian Goetz brian.goetz at oracle.com
Thu Jan 10 16:29:10 PST 2013
- Previous message: Anagrams Kata (Was: ConcurrentHashMap/ConcurrentMap/Map.compute)
- Next message: Anagrams Kata (Was: ConcurrentHashMap/ConcurrentMap/Map.compute)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
The sorts are all serial. So what is happening there is that instead of sorting one big thing, you are sorting lots of small things.
The break-even on parallel groupBy may be high because of merging overhead, especially if we've got our split thresholds tuned too low, in which case we create lots of little maps and spend a lot of time merging them.
On 1/10/2013 7:24 PM, Joe Bowbeer wrote:
Thanks Paul.
To investigate, I pre-read all the lines: List lines = Files.readAllLines(path, Charset.defaultCharset()); and then profiled the three variants below on a dual-core desktop: 1. map = lines.stream().accumulate(Accumulators.<String,_ _String>groupBy(Anagrams::key)); 2. map = lines.parallelStream().accumulate(Accumulators.<String,_ _String>groupBy(Anagrams::key)); 3. map = lines.stream().parallel().accumulate(Accumulators.<String,_ _String>groupBy(Anagrams::key)); Results: 1. stream accumulate = 260ms sort = 80ms toLowercase = 20ms 2. parallelStream accumulate = 1342ms sort = 283ms toLowercase = 108ms 3. stream().parallel() accumulate = 1186ms sort = 629ms toLowercase = 826ms
This is a slow-down (5x more work), though not as dramatic as what happened in the original case: Originally, the expression was one of: 1. map = r.lines().accumulate(Accumulators.<String,_ _String>groupBy(Anagrams::key)); 2. map = r.lines().parallel().accumulate(Accumulators.<String,_ _String>groupBy(Anagrams::key)); Results: 1. lines() accumulate = 396ms sort = 121ms toLowercase = 76ms 2. lines().parallel() accumulate = 20010ms! sort = 801ms toLowercase = ~ms --Joe On Thu, Jan 10, 2013 at 12:55 PM, Paul Sandoz <paul.sandoz at oracle.com_ _<mailto:paul.sandoz at oracle.com>> wrote: On Jan 10, 2013, at 7:28 PM, Joe Bowbeer <joe.bowbeer at gmail.com_ _<mailto:joe.bowbeer at gmail.com>> wrote: > Thanks for the suggestions Paul. > > After updating to jdk8lambda b72, the anagram collector is now expressed as: > > map = r.lines().accumulate(Accumulators.<String,_ _String>groupBy(Anagrams::key)); > > Notes: > > 1. Adding .parallel() after lines() works, but in my tests on a dual-core laptop, it increased the accumulate time by an order of magnitude. > I would need to look at this more closely tomorrow but i suspect it is due to two things: 1) an terribly unbalanced tree is created (using a spliterator from an iterator since there is no size information); and 2) the map merging are both contributing to the slow down. Need to try the concurrent groupBy. Things are moving so fast i am forgetting if b72 has that, oh for the need of continuous builds! If there are 354,984 words then that will make a tree of at least 354,984/1024 leaf nodes with the ~ the same depth using the current spliterator from iterator technique. A simple verification is to load up the words into memory as a list and use that list as the source of words. Paul. > 2. The Accumulators.<> type specifier is currently required, unfortunately. > > The complete source (all 31 lines) is at bitbucket.org/joebowbeer/anagrams/src/ <http://bitbucket.org/joebowbeer/anagrams/src/> > > In my trials, I'm using the long list of single words from http://icon.shef.ac.uk/Moby/mwords.html > > Joe > > On Wed, Jan 9, 2013 at 1:15 AM, Paul Sandoz <paul.sandoz at oracle.com <mailto:paul.sandoz at oracle.com>> wrote: > > On Jan 9, 2013, at 7:47 AM, Joe Bowbeer <joe.bowbeer at gmail.com_ _<mailto:joe.bowbeer at gmail.com>> wrote: > > > In the javadoc for Map.compute in b72, there is an error in the sample > > snippet: > > > > http://download.java.net/lambda/b72/docs/api/java/util/Map.html > > > > Attempts to compute a mapping for the specified key and its current mapped > >> value (or null if there is no current mapping). For example, to either > >> create or append a String msg to a value mapping: > >> map.compute(key, (k, v, msg) -> (v == null) ? msg : v.concat(msg)) > > > > > > Error: BiFunction does not accept three arguments. In particular, msg is > > extraneous. It should be defined in the lexical scope? > > > > > > Btw, I pondered how to efficiently use compute() or merge() to simulate a > > multi-map and I eventually gave up. > > > > I eventually wrote the following, which accumulates lists of anagrams, > > given a list of words: > > > > r.lines().forEach(s -> map.computeIfAbsent(key(s), k -> new > > ArrayList<>()).add(s)); > > > > Have you tried using the latest groupBy functionality? e.g.: > > Map<String, Collection> anagrams = r.lines().parallel().reduce(groupBy(s -> key(s))); > > It would be interesting to compare performance of reduce vs. reduceUnordered and the concurrent groupBy leveraging CHM. > > > > > > Where key() returns the canonical key for a given word, and r is a > > BufferedReader for the dictionary file. > > > > The following line prints the lists of anagrams: > > > > map.values().stream().filter(v -> v.size() > 1).forEach(v -> > > System.out.println(v)); > > > > Little style tip: > > forEach(System.out::println) > > Paul. >
- Previous message: Anagrams Kata (Was: ConcurrentHashMap/ConcurrentMap/Map.compute)
- Next message: Anagrams Kata (Was: ConcurrentHashMap/ConcurrentMap/Map.compute)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the lambda-libs-spec-observers mailing list