Anagrams Kata (Was: ConcurrentHashMap/ConcurrentMap/Map.compute) (original) (raw)
Joe Bowbeer joe.bowbeer at gmail.com
Thu Jan 10 16:24:41 PST 2013
- Previous message: Anagrams Kata (Was: ConcurrentHashMap/ConcurrentMap/Map.compute)
- Next message: Anagrams Kata (Was: ConcurrentHashMap/ConcurrentMap/Map.compute)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Thanks Paul.
To investigate, I pre-read all the lines:
List lines = Files.readAllLines(path, Charset.defaultCharset());
and then profiled the three variants below on a dual-core desktop:
- map = lines.stream().accumulate(Accumulators.<String, String>groupBy(Anagrams::key));
- map = lines.parallelStream().accumulate(Accumulators.<String, String>groupBy(Anagrams::key));
- map = lines.stream().parallel().accumulate(Accumulators.<String, String>groupBy(Anagrams::key));
Results:
- stream
accumulate = 260ms sort = 80ms toLowercase = 20ms
- parallelStream
accumulate = 1342ms sort = 283ms toLowercase = 108ms
- stream().parallel()
accumulate = 1186ms sort = 629ms toLowercase = 826ms
This is a slow-down (5x more work), though not as dramatic as what happened in the original case:
Originally, the expression was one of:
- map = r.lines().accumulate(Accumulators.<String, String>groupBy(Anagrams::key));
- map = r.lines().parallel().accumulate(Accumulators.<String, String>groupBy(Anagrams::key));
Results:
- lines()
accumulate = 396ms sort = 121ms toLowercase = 76ms
- lines().parallel()
accumulate = 20010ms! sort = 801ms toLowercase = ~ms
--Joe
On Thu, Jan 10, 2013 at 12:55 PM, Paul Sandoz <paul.sandoz at oracle.com>wrote:
On Jan 10, 2013, at 7:28 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote: > Thanks for the suggestions Paul. > > After updating to jdk8lambda b72, the anagram collector is now expressed as: > > map = r.lines().accumulate(Accumulators.<String,_ _String>groupBy(Anagrams::key)); > > Notes: > > 1. Adding .parallel() after lines() works, but in my tests on a dual-core laptop, it increased the accumulate time by an order of magnitude. > I would need to look at this more closely tomorrow but i suspect it is due to two things: 1) an terribly unbalanced tree is created (using a spliterator from an iterator since there is no size information); and 2) the map merging are both contributing to the slow down. Need to try the concurrent groupBy. Things are moving so fast i am forgetting if b72 has that, oh for the need of continuous builds! If there are 354,984 words then that will make a tree of at least 354,984/1024 leaf nodes with the ~ the same depth using the current spliterator from iterator technique. A simple verification is to load up the words into memory as a list and use that list as the source of words. Paul. > 2. The Accumulators.<> type specifier is currently required, unfortunately. > > The complete source (all 31 lines) is at bitbucket.org/joebowbeer/anagrams/src/ > > In my trials, I'm using the long list of single words from http://icon.shef.ac.uk/Moby/mwords.html > > Joe > > On Wed, Jan 9, 2013 at 1:15 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote: > > On Jan 9, 2013, at 7:47 AM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote: > > > In the javadoc for Map.compute in b72, there is an error in the sample > > snippet: > > > > http://download.java.net/lambda/b72/docs/api/java/util/Map.html > > > > Attempts to compute a mapping for the specified key and its current mapped > >> value (or null if there is no current mapping). For example, to either > >> create or append a String msg to a value mapping: > >> map.compute(key, (k, v, msg) -> (v == null) ? msg : v.concat(msg)) > > > > > > Error: BiFunction does not accept three arguments. In particular, msg is > > extraneous. It should be defined in the lexical scope? > > > > > > Btw, I pondered how to efficiently use compute() or merge() to simulate a > > multi-map and I eventually gave up. > > > > I eventually wrote the following, which accumulates lists of anagrams, > > given a list of words: > > > > r.lines().forEach(s -> map.computeIfAbsent(key(s), k -> new > > ArrayList<>()).add(s)); > > > > Have you tried using the latest groupBy functionality? e.g.: > > Map<String, Collection> anagrams = r.lines().parallel().reduce(groupBy(s -> key(s))); > > It would be interesting to compare performance of reduce vs. reduceUnordered and the concurrent groupBy leveraging CHM. > > > > > > Where key() returns the canonical key for a given word, and r is a > > BufferedReader for the dictionary file. > > > > The following line prints the lists of anagrams: > > > > map.values().stream().filter(v -> v.size() > 1).forEach(v -> > > System.out.println(v)); > > > > Little style tip: > > forEach(System.out::println) > > Paul. >
- Previous message: Anagrams Kata (Was: ConcurrentHashMap/ConcurrentMap/Map.compute)
- Next message: Anagrams Kata (Was: ConcurrentHashMap/ConcurrentMap/Map.compute)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the lambda-libs-spec-observers mailing list