RFR (XS) CR 8014233: java.lang.Thread should have @Contended on TLR fields (original) (raw)

Aleksey Shipilev aleksey.shipilev at oracle.com
Tue Jun 18 06:56:30 UTC 2013


Hi David,

It depends on the scenario we are assessing. For the sake of argument, let's say every thread had requested TLR.current() at least once.

Before the merge: Thread maps for ThreadLocal =~ 32 bytes x #threads TLR instances + padding =~ (128 + 8?) bytes x #threads

After the merge: TLR fields in Thread + padding =~ (2x128 + 16) x #threads

So, there is the additional footprint cost per Thread; but that seems abysmal comparing to what native thread already allocates for its native structures (e.g. stack). Note that @Contended does larger padding anticipating the hardware prefetchers also turned on (VM can get better at this though).

Gory details:

**** -XX:-EnableContended: ****

Running 64-bit HotSpot VM. Using compressed references with 3-bit shift. Objects are 8 bytes aligned.

java.lang.Thread offset size type description 0 12 (assumed to be the object header

**** -XX:+EnableContended: ****

Running 64-bit HotSpot VM. Using compressed references with 3-bit shift. Objects are 8 bytes aligned.

java.lang.Thread offset size type description 0 12 (assumed to be the object header

-Aleksey.

On 06/18/2013 06:03 AM, David Holmes wrote:

Hi Aleksey,

What is the overall change in memory use for this set of changes ie what did we use pre TLR merging and what do we use now? Thanks, David On 17/06/2013 7:00 PM, Aleksey Shipilev wrote: Hi,

This is the respin of the RFE filed a month ago: http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-May/016754.html The webrev is here: http://cr.openjdk.java.net/~shade/8014233/webrev.02/ Testing: - JPRT build passes - Linux x8664/release passes jdk/java/lang jtreg - vm.quick.testlist, vm.quick-gc.testlist on selected platforms - microbenchmarks, see below The rationale follows. After we merged ThreadLocalRandom state in the thread, we are now missing the padding to prevent false sharing on those heavily-updated fields. While the Thread is already large enough to separate two TLR states for two distinct threads, we can still get the false sharing with other thread fields. There is the benchmark showcasing this: http://cr.openjdk.java.net/~shade/8014233/threadbench.zip There are two test cases: first one is only calling its own TLR with nextInt() and then the current thread's ID, another test calls another thread ID, thus inducing the false sharing against another thread's TLR state. On my 2x2 i5 laptop, running Linux x8664: same: 355 +- 1 ops/usec other: 100 +- 5 ops/usec Note the decrease in throughput because of the false sharing. With the patch: same: 359 +- 1 ops/usec other: 356 +- 1 ops/usec Note the performance is back. We want to evade these spurious decreases in performance, due to either unlucky memory layout, or the user code (un)intentionally ruining the cache line locality for the updater thread. Thanks, -Aleksey.



More information about the core-libs-dev mailing list