RFR: 8003246: Add Supplier to ThreadLocal (original) (raw)

Peter Levart peter.levart at gmail.com
Thu Dec 6 20:38:02 UTC 2012


On 12/06/2012 08:08 PM, Remi Forax wrote:

On 12/06/2012 08:01 PM, Peter Levart wrote:

There's a quick trick that guarantees in-lining of get/set/remove:

public static class FastThreadLocal extends ThreadLocal { @Override public final T get() { return super.get(); } @Override public final void set(T value) { super.set(value); } @Override public final void remove() { super.remove(); } } ....just use static type FastThreadLocal everywhere in code. I tried it and it works. No, there is no way to have such guarantee, here, it works either because the only class ThreadLocal you load is FastThreadLocal or because the VM has profiled the callsite see that you only use FastThreadLocal for a specific instruction.

Nothing is certain but death and taxes, I agree.

But think deeper, Remi!

How do you explain the following test:

public class ThreadLocalTest {

 static class Int { int value; }

 static class TL0 extends ThreadLocal<Int> {}
 static class TL1 extends ThreadLocal<Int> { public Int get() { 

return super.get(); } } static class TL2 extends ThreadLocal { public Int get() { return super.get(); } } static class TL3 extends ThreadLocal { public Int get() { return super.get(); } } static class TL4 extends ThreadLocal { public Int get() { return super.get(); } }

 static long doTest(ThreadLocal<Int> tl) {
     long t0 = System.nanoTime();
     for (int i = 0; i < 100000000; i++)
         tl.get().value++;
     return System.nanoTime() - t0;
 }

 static long doTest(FastThreadLocal<Int> tl) {
     long t0 = System.nanoTime();
     for (int i = 0; i < 100000000; i++)
         tl.get().value++;
     return System.nanoTime() - t0;
 }

 static long test0(ThreadLocal<Int> tl) {
     if (tl instanceof FastThreadLocal)
         return doTest((FastThreadLocal<Int>)tl);
     else
         return doTest(tl);
 }

 static void test(ThreadLocal<Int> tl) {
     tl.set(new Int());
     System.out.print(tl.getClass().getName() + ":");
     for (int i = 0; i < 8; i++)
         System.out.print(" " + test0(tl));
     System.out.println();
 }

 public static void main(String[] args) {
     TL0 tl0 = new TL0();
     test(tl0);
     test(new TL1());
     test(new TL2());
     test(new TL3());
     test(new TL4());
     test(tl0);
 }

}

Which prints the following (demonstrating almost 2x slowdown of TL0 - last line compared to first):

test.ThreadLocalTest$TL0: 342716421 326105315 300744544 300654890 300726346 300752009 300700781 300735651 test.ThreadLocalTest$TL1: 321424139 312128166 312173383 312125203 312142144 312150949 316760957 313393554 test.ThreadLocalTest$TL2: 525661886 524169413 524184405 524215685 524162050 524400364 524174966 454370228 test.ThreadLocalTest$TL3: 472042229 471071328 464387909 468047355 464795171 464466481 464449567 464365974 test.ThreadLocalTest$TL4: 459651686 454142365 454129481 454180718 454217277 454109611 454119988 456978405 test.ThreadLocalTest$TL0: 582252322 582773455 582612509 582753610 582626360 582852195 582805654 582598285

Now with a simple change of:

 static class TL0 extends FastThreadLocal<Int> {}

...the same test prints:

test.ThreadLocalTest$TL0: 330722181 325823711 301171182 309992192 321868979 308111417 303806979 300612033 test.ThreadLocalTest$TL1: 330263857 326448062 300607081 300575641 307442821 300616794 300548457 303462898 test.ThreadLocalTest$TL2: 319627165 311309477 311465815 311279612 311294427 311315803 311470291 311293823 test.ThreadLocalTest$TL3: 526849874 524209792 524421574 524166747 524396011 524163313 524395641 524165429 test.ThreadLocalTest$TL4: 464963126 455172216 455466304 455245487 455368318 455093735 455125038 455317375 test.ThreadLocalTest$TL0: 300472239 300695398 300480230 303459397 300451419 300679904 300445717 300451166

And that's very repeatable! Try it for yourself (on JDK8 of course).

Regards, Peter

Regards, Peter cheers, Rémi On 12/06/2012 01:03 PM, Doug Lea wrote: On 12/06/12 06:56, Vitaly Davidovich wrote: Doug,

When you see the fast to slow ThreadLocal transition due to class loading invalidating inlined get(), do you not then see it get restored back to fast mode since the receiver type in your call sites is still the monomorphic ThreadLocal (and not the unrelated subclasses)? Just trying to understand what Rémi and you are saying.

The possible outcomes are fairly non-deterministic, depending on hotspot's mood about recompiles, tiered-compile interactions, method size, Amddahl's law interactions, phase of moon, etc. (In j.u.c, we have learned that our users appreciate things being predictably fast enough rather than being unpredictably sometimes even faster but often slower. So when we see such cases, as with ThreadLocal, they get added to todo list.) -Doug



More information about the core-libs-dev mailing list