[concurrency-interest] RFR [8011215] optimization of CopyOnWriteArrayList.addIfAbsent() (original) (raw)

Doug Lea dl at cs.oswego.edu
Wed Apr 3 10:35:27 UTC 2013


A few quick notes while travelling:

This was designed to perform best in the case of possibly contended updates when the element is absent, by avoiding retraversal, and thus minimizing lock hold times in the common case. (When not common, it can be guarded by a contains check.) However even in this case, it is possible that a retraversal via arraycopy could be faster because it can use optimized cheaper writes (fewer card marks). I'll recheck this on enough machines to make a better assessment of impact withing a few days.

-Doug

CPU will predict but compiler may not optimize loop aggressively with the branch there; e.g. if you unroll, you'll bloat code size by having a branch on each element, and then you can hit icache misses.

Arrays.copyOf is intrinsic so should have perf similar to System.arrayCopy. Sent from my phone On Apr 2, 2013 7:51 PM, "Stanimir Simeonoff" <stanimir at riflexo.com> wrote:

On Wed, Apr 3, 2013 at 1:47 AM, Martin Buchholz <martinrb at google.com>wrote:

On Tue, Apr 2, 2013 at 3:45 PM, Ivan Gerasimov <ivan.gerasimov at oracle.com_ _> wrote: Thank you, Ulf! maybe the old code wins for looong arrays, so there could be a threshold to decide between old and new code: I've modified the benchmark code to test arrays with 90'000 to 100'000 elements. (Previously was testing 1 to 100 elements.) The performance gain turns out to be even more significant. On my machine tests show that with that many elements the new code runs 40% faster. Honestly, I didn't expect that. I thought my code might be a bit slower and hoped that not much slower. Yeah, that's a bit surprising. Perhaps because you're avoiding the branch of testing object for null on each iteration? The branch would be perfectly predicted by the CPU, so it cannot be that. System.arraycopy would be much better for larger arrays as it uses SSE extensions to copy but normally COW structures would not have so many elements. btw, the 2 pass version benefits of low memory footprint equals() (and all in L2 cache). Basically if you don't get cache trashing Arrays.copy would be the better one. Stanimir


Concurrency-interest mailing list Concurrency-interest at cs.oswego.edu http://cs.oswego.edu/mailman/listinfo/concurrency-interest


Concurrency-interest mailing list Concurrency-interest at cs.oswego.edu http://cs.oswego.edu/mailman/listinfo/concurrency-interest


Concurrency-interest mailing list Concurrency-interest at cs.oswego.edu http://cs.oswego.edu/mailman/listinfo/concurrency-interest



More information about the core-libs-dev mailing list