[OpenJDK 2D-Dev] sun.java2D.pisces big memory usage (waste ?) (original) (raw)

Laurent Bourgès bourges.laurent at gmail.com
Fri Mar 29 13:53:09 UTC 2013


Phil,

I agree it is a complex issue to improve memory usage while maintaining performance at the JDK level: applications can use java2d pisces in very different contexts: Swing app (client with only EDT thread), server-side application (multi thread headless) ...

For the moment, I spent a lot of my time understanding the different classes in java2d.pisces and analyzing memory usage / performance ... using J2DBench (all graphics tests).

In my Swing application, pisces produces a lot of waste (GC) but on server side, the GC overhead can be more important if several threads use pisces.

Pisces uses memory differently:

For the moment I am trying to avoid memory waste (pooling or kept reference) without any memory constraint (no eviction) but I agree it is an important aspect for server-side applications.

To avoid concurrency issues, I use a ThreadLocal context named RendererContext to keep few temporary arrays (float6 and a BIG rowAARLE instance) but there is also dynamic IntArrayCache et FloatArrayCache which have several pools divided in buckets (256, 1024, 4096, 16384, 32768) containing only few instances.

To have best performance, I studied pisces code to clear only the used array parts when recycling or using dirty arrays (only clear rowAARLE[...][1]).

I think Andrea's proposal is interesting to maybe put some system properties to give hints (low memory footprint, use cache or not ...).

2013/3/28 Phil Race <philip.race at oracle.com>

Maintaining a pool of objects might be an appropriate thing for an applications, but its a lot trickier for the platform as the application's usage pattern or intent is largely unknown. Weak references or soft references might be of use but weak references usually go away even at the next incremental GC and soft references tend to not go away at all until you run out of heap.

Agreed; for the moment, pool eviction policy is not implemented but kept in mind. FYI: each RendererContext (per thread) has its own array pools (not shared) that could have different caching policies: For instance, AWT / EDT (repaint) could use a large cache although other threads do not use array caching at all.

You may well be right that always doubling the array size may be too simplistic, but it would need some analysis of the code and its usage to see how much better we can do.

There is two part:

Laurent

2013/3/28 Phil Race <philip.race at oracle.com>

Maintaining a pool of objects might be an appropriate thing for an applications, but its a lot trickier for the platform as the application's usage pattern or intent is largely unknown. Weak references or soft references might be of use but weak references usually go away even at the next incremental GC and soft references tend to not go away at all until you run out of heap.

You may well be right that always doubling the array size may be too simplistic, but it would need some analysis of the code and its usage to see how much better we can do.

>Apparently, Arrays.fill is always faster (size in 10 ... 10 000) ! > I suspect hotspot to optimize its code and use native functions, isn't it ??? I suppose there is some hotspot magic involved to recognise and intrinsify this method, since the source code looks like a plain old for loop. -phil.

On 3/26/2013 4:00 AM, Laurent Bourgès wrote: Dear all, First I joined recently the openJDK contributors, and I plan to fix java2D pisces code in my spare time. I have a full time job on Aspro2: http://www.jmmc.fr/aspro; it is an application to prepare astronomical observations at VLTI / CHARA and is very used in our community (200 users): it provides scientific computations (observability, model images using complex numbers ...) and zoomable plots thanks to jFreeChart. Aspro2 is known to be very efficient (computation parallelization) and I am often doing profiling using netbeans profiler or visualVM. To fix huge memory usages by java2d.pisces, I started implementing an efficient ArrayCache (int[] and float[]) (in thread local to concurrency problems): - arrays in sizes between 10 and 10000 (more small arrays used than big ones) - resizing support (Arrays.copyOf) without wasting arrays - reentrance i.e. many arrays are used at the same time (java2D Pisces stroke / dash creates many segments to render) - GC / Heap friendly ie support cache eviction and avoid consuming too much memory I know object pooling is known to be not efficient with recent VM (GC is better) but I think it is counter productive to create so many int[] arrays in java2d.pisces and let the GC remove such wasted memory. Does someone have implemented such (open source) array cache (core-libs) ? Opinions are welcome (but avoid "trolls"). Moreover, sun.java2d.pisces.Helpers.**widenArray() performs a lot of array resizing / copy (Arrays.copyOf) that I want to avoid mostly: // These use a hardcoded factor of 2 for increasing sizes. Perhaps this // should be provided as an argument. static float[] widenArray(float[] in, final int cursize, final int numToAdd) { if (in.length >= cursize + numToAdd) { return in; } return Arrays.copyOf(in, 2 * (cursize + numToAdd)); } static int[] widenArray(int[] in, final int cursize, final int numToAdd) { if (in.length >= cursize + numToAdd) { return in; } return Arrays.copyOf(in, 2 * (cursize + numToAdd)); } Thanks to Peter Levart, I use its microbench tool ( https://github.com/plevart/**micro-bench/tree/v2<https://github.com/plevart/micro-bench/tree/v2>) to benchmark ArrayCache operations... and J2DBench to test java2d performances What is the fastest way to clear an array (part) ie fill by 0: - public static void fill(int[] a, int fromIndex, int toIndex, int val) - public static native void arraycopy(Object src, int srcPos, Object dest, int destPos, int length); - unsafe.setMemory(array, Unsafe.ARRAYINTBASEOFFSET, 512 * SIZEOFINT, (byte) 0) Apparently, Arrays.fill is always faster (size in 10 ... 10 000) ! I suspect hotspot to optimize its code and use native functions, isn't it ??? Benchmarks results: >> JVM START: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] Testing arrays: int[1]... # # ZeroFill: run duration: 5 000 ms, #of logical CPUS: 4 # # Warm up: runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 4,47 ns/op (σ = 0,00 ns/op) [ 4,47] runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 4,40 ns/op (σ = 0,00 ns/op) [ 4,40] # Measure: runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 4,43 ns/op (σ = 0,00 ns/op) [ 4,43] runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 2 threads, Tavg = 5,55 ns/op (σ = 0,16 ns/op) [ 5,40, 5,72] # # FillArraySystemCopy: run duration: 5 000 ms, #of logical CPUS: 4 # # Warm up: runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 6,47 ns/op (σ = 0,00 ns/op) [ 6,47] runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 6,21 ns/op (σ = 0,00 ns/op) [ 6,21] # Measure: runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 6,19 ns/op (σ = 0,00 ns/op) [ 6,19] runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 2 threads, Tavg = 7,80 ns/op (σ = 0,10 ns/op) [ 7,90, 7,71] # # FillArrayUnsafe: run duration: 5 000 ms, #of logical CPUS: 4 # # Warm up: runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 26,82 ns/op (σ = 0,00 ns/op) [ 26,82] runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 23,48 ns/op (σ = 0,00 ns/op) [ 23,48] # Measure: runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 22,42 ns/op (σ = 0,00 ns/op) [ 22,42] runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 2 threads, Tavg = 28,21 ns/op (σ = 0,88 ns/op) [ 29,11, 27,36] Testing arrays: int[100]... # # ZeroFill: run duration: 5 000 ms, #of logical CPUS: 4 # # Warm up: runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 16,49 ns/op (σ = 0,00 ns/op) [ 16,49] runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 15,97 ns/op (σ = 0,00 ns/op) [ 15,97] # Measure: runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 16,03 ns/op (σ = 0,00 ns/op) [ 16,03] runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 2 threads, Tavg = 19,32 ns/op (σ = 0,46 ns/op) [ 18,87, 19,80] # # FillArraySystemCopy: run duration: 5 000 ms, #of logical CPUS: 4 # # Warm up: runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 14,51 ns/op (σ = 0,00 ns/op) [ 14,51] runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 14,17 ns/op (σ = 0,00 ns/op) [ 14,17] # Measure: runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 14,09 ns/op (σ = 0,00 ns/op) [ 14,09] runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 2 threads, Tavg = 31,15 ns/op (σ = 4,04 ns/op) [ 27,65, 35,67] # # FillArrayUnsafe: run duration: 5 000 ms, #of logical CPUS: 4 # # Warm up: runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 52,32 ns/op (σ = 0,00 ns/op) [ 52,32] runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 52,82 ns/op (σ = 0,00 ns/op) [ 52,82] # Measure: runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 52,19 ns/op (σ = 0,00 ns/op) [ 52,19] runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 2 threads, Tavg = 70,87 ns/op (σ = 0,71 ns/op) [ 70,17, 71,59] Testing arrays: int[10000]... # # ZeroFill: run duration: 5 000 ms, #of logical CPUS: 4 # # Warm up: runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 1 208,64 ns/op (σ = 0,00 ns/op) [ 1 208,64] runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 1 238,01 ns/op (σ = 0,00 ns/op) [ 1 238,01] # Measure: runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 1 235,81 ns/op (σ = 0,00 ns/op) [ 1 235,81] runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 2 threads, Tavg = 1 325,11 ns/op (σ = 7,01 ns/op) [ 1 332,16, 1 318,14] # # FillArraySystemCopy: run duration: 5 000 ms, #of logical CPUS: 4 # # Warm up: runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 1 930,93 ns/op (σ = 0,00 ns/op) [ 1 930,93] runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 2 060,80 ns/op (σ = 0,00 ns/op) [ 2 060,80] # Measure: runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 2 105,21 ns/op (σ = 0,00 ns/op) [ 2 105,21] runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 2 threads, Tavg = 2 160,33 ns/op (σ = 13,74 ns/op) [ 2 146,68, 2 174,15] # # FillArrayUnsafe: run duration: 5 000 ms, #of logical CPUS: 4 # # Warm up: runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 3 099,50 ns/op (σ = 0,00 ns/op) [ 3 099,50] runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 3 041,81 ns/op (σ = 0,00 ns/op) [ 3 041,81] # Measure: runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 1 threads, Tavg = 3 068,34 ns/op (σ = 0,00 ns/op) [ 3 068,34] runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22] 2 threads, Tavg = 3 296,13 ns/op (σ = 34,97 ns/op) [ 3 331,47, 3 261,53]

PS: java.awt.geom.Path2D has also memory allocation issues: void needRoom(boolean needMove, int newCoords) { if (needMove && numTypes == 0) { throw new IllegalPathStateException("**missing initial moveto "+ "in path definition"); } int size = pointTypes.length; if (numTypes >= size) { int grow = size; if (grow > EXPANDMAX) { grow = EXPANDMAX; } pointTypes = Arrays.copyOf(pointTypes, size+grow); } size = floatCoords.length; if (numCoords + newCoords > size) { int grow = size; if (grow > EXPANDMAX * 2) { grow = EXPANDMAX * 2; } _if (grow < newCoords) {_ _grow = newCoords;_ _}_ _floatCoords = Arrays.copyOf(floatCoords, size+grow);_ _}_ _}_ _Best regards,_ _Laurent_ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.openjdk.java.net/pipermail/2d-dev/attachments/20130329/c6f35583/attachment.html>



More information about the 2d-dev mailing list