RFR: jsr166 jdk9 integration wave 12 (original) (raw)
Paul Sandoz Paul.Sandoz at oracle.com
Thu Nov 17 20:03:00 UTC 2016
- Previous message: RFR: jsr166 jdk9 integration wave 12
- Next message: RFR: jsr166 jdk9 integration wave 12
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 16 Nov 2016, at 16:03, Martin Buchholz <martinrb at google.com> wrote:
Apologies for having this wave turn into a tsunami. CopyOnWriteArrayList was rewritten due to finding https://bugs.openjdk.java.net/browse/JDK-8169738 ArrayDeque and ArrayBlockingQueue now both use efficient nested loop idiom for all traversals. Most bulk remove methods in all collection classes were rewritten for efficiency, using similar code. Lots of tests and benchmarks added.
Ok, i will just go through all of it.
Collection-optimization —
Perhaps consolidate the repeated mini bit set methods as package private static methods on AbstractCollection? Unclear where the j.u.concurrent ones should go though...
It’s purely a subjective thing but i would be inclined to change the variable name “deathRow" to something more neutral such as “removedBitSet”.
The 2 layer nested loop is quite clever, certainly twists the mind when first encountered. It’s arguably more readable if there are two separate, (not-nested) loops, but that also requires two explicit invocations of the action, which may perturb the performance characteristics.
ArrayDeque —
188 public ArrayDeque(int numElements) { 189 elements = new Object[Math.max(1, numElements + 1)]; 190 }
What if Integer.MAX_VALUE is passed?
202 public ArrayDeque(Collection<? extends E> c) { 203 elements = new Object[c.size() + 1]; 204 addAll(c); 205 }
What if c.size() returns Integer.MAX_VALUE?
226 * Adds i and j, mod modulus. 227 * Precondition and postcondition: 0 <= i < modulus, 0 <= j <= modulus. 228 */ 229 static final int add(int i, int j, int modulus) {
Is that upper bound correct for j? 0 <= j < modulus ?
317 c.forEach(e -> addLast(e));
this::addLast, up to you which you prefer
843 public boolean tryAdvance(Consumer<? super E> action) { 844 if (action == null) 845 throw new NullPointerException(); 846 int t, i; 847 if ((t = fence) < 0) t = getFence();
Is that for optimisation purposes, since the same check is also performed in getFence? If so that seems like overkill
848 if (t == (i = cursor)) 849 return false; 850 final Object[] es; 851 action.accept(nonNullElementAt(es = elements, i)); 852 cursor = inc(i, es.length);
Recommend updating cursor before the action
853 return true; 854 }
Collection8Test —
429 public void testRemoveAfterForEachRemaining() {
Are tests run by default with testImplementationDetails == true? I am guessing so.
Semaphore —
633 /** 634 * Acquires and returns all permits that are immediately available. 635 * Upon return, zero permits are available. 636 * 637 * @return the number of permits acquired 638 */ 639 public int drainPermits() { 640 return sync.drainPermits(); 641 }
Arguably, if positive acquires all permits, otherwise releases all permits. Perhaps:
Acquires and returns all permits that are immediately available, otherwise releases all permits (if negative). Upton return, zero permits are available.
@return the number of permits acquired, or the number of permits release (a negative value)
Probably requires a CCC which i can manage.
SplittableRandom —
533 /** 534 * Generates a pseudorandom number with the indicated number of 535 * low-order bits. Because this class has no subclasses, this 536 * method cannot be invoked or overridden. 537 * 538 * @param bits random bits 539 * @return the next pseudorandom value from this random number 540 * generator's sequence 541 */ 542 protected int next(int bits) { 543 return nextInt() >>> (32 - bits); 544 } 545
Unless i am missing something really subtle, I don’t think this is required since SplittableRandom does not extend from Random (unlike in ThreadLocalRandom), and no associated reflective test is required.
ForkJoin —
69 * tasks that are never joined. All worker threads are initialized 70 * with {@link Thread#isDaemon} set {@code true}.
CCC?
Thanks, Paul.
On Wed, Nov 16, 2016 at 3:34 PM, Paul Sandoz <paul.sandoz at oracle.com> wrote: Hi Martin, Glad you looked more closely at the perf aspects of ArrayDeque. I am struggling to determine what has changed since the last revision. Apart from ArrayDeque what files should i be looking at? Thanks, Paul. > On 16 Nov 2016, at 12:14, Martin Buchholz <martinrb at google.com> wrote: > > Thanks to Vitaly and Doug for being more concerned about offer/poll > performance than I was initially. ArrayDeque is now back to using the head > + tail + spare slot strategy, which keeps optimal performance for non-bulk > operations, while everything else gets faster. We have new tests and new > benchmarks, as a result of which new bugs were found and fixed, and this > integration wave has grown rather larger than originally intended. > > The JIT-friendly nested loop strategy was successful for making bulk > operations on circular arrays faster, and this is used for both ArrayDeque > and ArrayBlockingQueue. > > + /* > + * VMs excel at optimizing simple array loops where indices are > + * incrementing or decrementing over a valid slice, e.g. > + * > + * for (int i = start; i < end; i++) ... elements[i]_ _> + * > + * Because in a circular array, elements are in general stored in > + * two disjoint such slices, we help the VM by writing unusual > + * nested loops for all traversals over the elements. > + */ > > > Bulk removal operations on array-based collections retreat to a two-pass > algorithm, which is slightly slower (but still O(n)), but continues to > support (questionable) predicates that reentrantly access the collection in > read mode. > > http://cr.openjdk.java.net/~martin/webrevs/openjdk9/jsr166-jdk9-integration/ > > On Fri, Oct 21, 2016 at 12:32 PM, Martin Buchholz <martinrb at google.com> > wrote: > >> >> >> On Tue, Oct 18, 2016 at 4:18 PM, Martin Buchholz <martinrb at google.com> >> wrote: >> >>> >>> >>> On Tue, Oct 18, 2016 at 4:00 PM, Vitaly Davidovich <vitalyd at gmail.com> >>> wrote: >>> >>>> >>>>>> * Change in allocation/capacity policy. >>>>>> >>>>>> The removal of the power-of-two restriction, and applying a 1.5x >>>>> growth >>>>>> factor (same as ArrayList) seems sensible. Does this mean that the >>>>> ability >>>>>> to compute the proper array index by using x & (length-1) wasn't >>>>> worth it? >>>>>> Or at least not worth the extra tail wastage? >>>>>> >>>>> >>>>> There's no integer division to optimize, as with hash tables. >>>> >>>> But it does add some new branches, doesn't it? Potentially branches that >>>> aren't taken prior to JIT compilation, but taken later = deopt. >>>> >>> >>> If it's a smidgeon slower, I don't care that much; improvement in >>> flexibility and scalability are more important. >>> >>> Of course, I do care about algorithmic improvements to e.g. removeIf >>> >>> >>>> Curious if you ran some benchmarks on ArrayDeque. >>>> >>> >>> Not yet. Who would like to show how slow the code is? >>> >> >> I revived my ancient IteratorMicroBenchmark for the latest webrev, made it >> a proper jtreg test, added ArrayDeque Jobs, and verified that the new code >> is measurably faster than the old code, at least for the mundane job of >> iterating. >> http://cr.openjdk.java.net/~martin/webrevs/openjdk9/ >> jsr166-jdk9-integration/ArrayList/ >> >>
- Previous message: RFR: jsr166 jdk9 integration wave 12
- Next message: RFR: jsr166 jdk9 integration wave 12
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]