Proxy.isProxyClass scalability (original) (raw)
Peter Levart peter.levart at gmail.com
Wed Apr 17 14🔞50 UTC 2013
- Previous message: Proxy.isProxyClass scalability
- Next message: Proxy.isProxyClass scalability
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Mandy,
Here's the updated webrev:
https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.02/index.html
This adds TwoLevelWeakCache to the scene with following performance compared to other alternatives:
Summary (4 Cores x 2 Threads i7 CPU):
Test Threads ns/op Original Patch(CL field)
FlattenedWeakCache TwoLevelWeakCache
======================= ======= ============== ===============
================== =================
Proxy_getProxyClass 1 2,403.27 163.70
206.88 252.89
4 3,039.01 202.77
303.38 327.62
8 5,193.58 314.47
442.58 510.63
Proxy_isProxyClassTrue 1 95.02 10.78
41.85 42.03
4 2,266.29
10.80 42.32 42.07
8 4,782.29
20.53 72.29 69.25
Proxy_isProxyClassFalse 1 95.02 1.36
1.36 1.36
4 2,186.59
1.36 1.37 1.40
8 4,891.15
2.72 2.94 2.72
Annotation_equals 1 240.10 152.29
193.27 200.45
4 1,864.06 153.81
195.60 202.45
8 8,639.20 262.09
384.72 338.70
As expected, the Proxy.getProxyClass() is yet a little slower than with FlattenedWeakCache, but still much faster than original Proxy implementation. Another lookup in the ConcurrentHashMap and another indirection have a price, but we also get something in return - space.
This is all obtained on latest lambda build (with new segment-less ConcurrentHashMap). I also added another ClassLoader to see what happens when the 2nd is added to the cache:
Original Proxy, 32 bit addressing
class proxy size of delta to loaders classes caches prev.ln.
0 0 400 400
1 1 768 368
1 2 920 152
1 3 1072 152
1 4 1224 152
1 5 1376 152
1 6 1528 152
1 7 1680 152
1 8 1832 152
1 9 1984 152
1 10 2136 152
2 11 2456 320
2 12 2672 216
2 13 2824 152
2 14 2976 152
2 15 3128 152
2 16 3280 152
2 17 3432 152
2 18 3584 152
2 19 3736 152
2 20 3888 152
Original Proxy, 64 bit addressing
class proxy size of delta to loaders classes caches prev.ln.
0 0 632 632
1 1 1216 584
1 2 1448 232
1 3 1680 232
1 4 1912 232
1 5 2144 232
1 6 2376 232
1 7 2608 232
1 8 2840 232
1 9 3072 232
1 10 3304 232
2 11 3832 528
2 12 4192 360
2 13 4424 232
2 14 4656 232
2 15 4888 232
2 16 5120 232
2 17 5352 232
2 18 5584 232
2 19 5816 232
2 20 6048 232
Patched Proxy (FlattenedWeakCache), 32 bit addressing
class proxy size of delta to loaders classes caches prev.ln.
0 0 240 240
1 1 584 344
1 2 768 184
1 3 952 184
1 4 1136 184
1 5 1320 184
1 6 1504 184
1 7 1688 184
1 8 1872 184
1 9 2056 184
1 10 2240 184
2 11 2424 184
2 12 2736 312
2 13 2920 184
2 14 3104 184
2 15 3288 184
2 16 3472 184
2 17 3656 184
2 18 3840 184
2 19 4024 184
2 20 4208 184
Patched Proxy (FlattenedWeakCache), 64 bit addressing
class proxy size of delta to loaders classes caches prev.ln.
0 0 336 336
1 1 920 584
1 2 1200 280
1 3 1480 280
1 4 1760 280
1 5 2040 280
1 6 2320 280
1 7 2600 280
1 8 2880 280
1 9 3160 280
1 10 3440 280
2 11 3720 280
2 12 4256 536
2 13 4536 280
2 14 4816 280
2 15 5096 280
2 16 5376 280
2 17 5656 280
2 18 5936 280
2 19 6216 280
2 20 6496 280
Patched Proxy (TwoLevelWeakCache), 32 bit addressing
class proxy size of delta to loaders classes caches prev.ln.
0 0 240 240
1 1 752 512
1 2 896 144
1 3 1040 144
1 4 1184 144
1 5 1328 144
1 6 1472 144
1 7 1616 144
1 8 1760 144
1 9 1904 144
1 10 2048 144
2 11 2400 352
2 12 2608 208
2 13 2752 144
2 14 2896 144
2 15 3040 144
2 16 3184 144
2 17 3328 144
2 18 3472 144
2 19 3616 144
2 20 3760 144
Patched Proxy (TwoLevelWeakCache), 64 bit addressing
class proxy size of delta to loaders classes caches prev.ln.
0 0 336 336
1 1 1216 880
1 2 1440 224
1 3 1664 224
1 4 1888 224
1 5 2112 224
1 6 2336 224
1 7 2560 224
1 8 2784 224
1 9 3008 224
1 10 3232 224
2 11 3808 576
2 12 4160 352
2 13 4384 224
2 14 4608 224
2 15 4832 224
2 16 5056 224
2 17 5280 224
2 18 5504 224
2 19 5728 224
2 20 5952 224
So we loose approx. 32 bytes (32bit addresses) or 48 bytes (64 bit addresses) for each proxy class compared to original code when using FlattenedWeakCache, but we gain 8 bytes (32 bit or 64 bit addresses) for each proxy class cached compared to original code when using TwoLevelWeakCache. So which to favour, space or time?
Other comments in-line...
On 04/17/2013 07:31 AM, Mandy Chung wrote:
On 4/16/2013 7:18 AM, Peter Levart wrote:
Hi Mandy,
I prepared a preview variant of j.l.r.Proxy using WeakCache (turned into an interface and a special FlattenedWeakCache implementation in anticipation to create another variant using two-levels of ConcurrentHashMaps for backing storage, but with same API) just to compare performance: https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/proxy-wc/webrev.01/index.html
thanks for getting this prototype done quickly. As the values (Class objects of proxy classes) must be wrapped in a WeakReference, the same instance of WeakReference can be re-used as a key in another ConcurrentHashMap to implement quick look-up for Proxy.isProxyClass() method eliminating the need to use ClassValue, which is quite space-hungry. I also think maintaining another ConcurrentHashMap is a good replacement with the use of ClassValue to avoid its memory overhead. Comparing the performance, here's a summary of all 3 variants (original, patched using a field in ClassLoader and this variant): [...] The improvement is still quite satisfactory, although a little slower than the direct-field variant. The scalability is the same as with direct-field variant. Agree - the improvement is quite good. Space consumption of cache structure, calculated as deep-size of the structure, ignoring interned Strings, Class and ClassLoader objects unsing single non-bootstrap ClassLoader for defining the proxy classes and using 32 bit addressing is the following: [...] So with new ConcurrentHashMap the patched Proxy uses about 32 bytes more per proxy class. Is this satisfactory or should we also try a variant with two-levels of ConcurrentHashMaps? The overhead seems okay to trade off the scalability. Since you have prepared for doing another variant, it'd be good to compare two prototypes if this doesn't bring too much work :) I would imagine that there might be slight difference in your measurement when comparing with proxies defined by a single class loader but the code might be simpler (might not be if you keep the same API but different implementation).
With TwoLevelWeakCache, there is a "step" of 108 bytes (32bit addresses) when new ClassLoader is encoutered (new 2nd level ConcurrentHashMap is allocated and new entry added to 1st level CHM. There's no such "step" in FlattenedWeakCache (modulo the steps when the CHMs are itself resized). So we roughly have 108 bytes wasted for each new ClassLoader encountered with TwoLevelWeakCache vs. FlattenedWeakCache, but we also have 40 bytes spared for each proxy class cached. TwoLevelWeakCache starts to pay off if there are at least 3 proxy classes defined per ClassLoader in average.
Regardless of which approach to use - you have added a general purpose WeakCache and the implementation class in the sun.misc package. While it's good to have such class for other jdk class to use, I am more comfortable in keeping it as a private class for proxy implementation to use. We need existing applications to migrate away from sun.misc and other private APIs to prepare for modularization.
What about package-private in java.lang.reflect? It makes Proxy itself much easier to read. When we decide which way to go, I can remove the interface and only leave a single package-private class...
Nits: can you wrap the lines around 80 columns including comments? try-catch-finally statements need some formatting fixes. Our convention is to have 'catch', or 'finally' following the closing bracket '}' in the same line. Your editor breaks 'catch' or 'finally' into the next line.
Fixed.
Regards, Peter
Even without SecurityManager installed the performance of native getClassLoader0 was a hog. I don't know why? Isn't there an implicit reference to defining ClassLoader from every Class object? That's right - it looks for the caller class only if the security manager is installed. The defining class loader is kept in the VM's Klass object (language-level Class instance representation in the VM) and there is no computation needed to obtain a defining class loader of a given Class object. I can only think of the Java <-> native transition overhead that could be one factor. Class.getClassLoader0 is not intrinsified. I'll find out (others on this mailing list may probably know). Mandy
- Previous message: Proxy.isProxyClass scalability
- Next message: Proxy.isProxyClass scalability
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]