[OpenJDK 2D-Dev] sun.java2D.pisces big memory usage (waste ?) (original) (raw)
Laurent Bourgès bourges.laurent at gmail.com
Thu Apr 4 13:44:48 UTC 2013
- Previous message: [OpenJDK 2D-Dev] sun.java2D.pisces big memory usage (waste ?)
- Next message: [OpenJDK 2D-Dev] sun.java2D.pisces big memory usage (waste ?)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I updated both patched pisces code and benchmarks: http://jmmc.fr/~bourgesl/share/java2d-pisces/
Few results comparing ThreadLocal vs ConcurrentLinkedQueue usage:
OpenJDK 8 PATCH ThreadLocal mode: Testing file /home/bourgesl/libs/openjdk/mapbench/test/dc_boulder_2013-13-30-06-13-17.ser 1 threads and 20 loops per thread, time: 2671 ms 2 threads and 20 loops per thread, time: 3239 ms 4 threads and 20 loops per thread, time: 6043 ms
OpenJDK 8 PATCH ConcurrentLinkedQueue mode: Testing file /home/bourgesl/libs/openjdk/mapbench/test/dc_boulder_2013-13-30-06-13-17.ser 1 threads and 20 loops per thread, time: 2779 ms 2 threads and 20 loops per thread, time: 3416 ms 4 threads and 20 loops per thread, time: 6153 ms
Oracle JDK8 Ductus: Testing file /home/bourgesl/libs/openjdk/mapbench/dc_boulder_2013-13-30-06-13-17.ser 1 threads and 20 loops per thread, time: 1894 ms 2 threads and 20 loops per thread, time: 3905 ms 4 threads and 20 loops per thread, time: 7485 ms
OpenJDK 8 PATCH ThreadLocal mode: Testing file /home/bourgesl/libs/openjdk/mapbench/test/dc_shp_alllayers_2013-00-30-07-00-47.ser 1 threads and 20 loops per thread, time: 24211 ms 2 threads and 20 loops per thread, time: 30955 ms 4 threads and 20 loops per thread, time: 67715 ms
OpenJDK 8 PATCH ConcurrentLinkedQueue mode: Testing file /home/bourgesl/libs/openjdk/mapbench/test/dc_shp_alllayers_2013-00-30-07-00-47.ser 1 threads and 20 loops per thread, time: 25984 ms 2 threads and 20 loops per thread, time: 33131 ms *4 threads and 20 loops per thread, time: 75343 ms * Oracle JDK8 Ductus: Loading drawing commands from file: /home/bourgesl/libs/openjdk/mapbench/dc_shp_alllayers_2013-00-30-07-00-47.ser Loaded DrawingCommands: DrawingCommands{width=1400, height=800, commands=135213} 1 threads and 20 loops per thread, time: 20911 ms 2 threads and 20 loops per thread, time: 39297 ms 4 threads and 20 loops per thread, time: 103392 ms
ConcurrentLinkedQueue add a small overhead but not too much vs ThreadLocal.
Is it possible to test efficiently if the current thread is EDT then I could use ThreadLocal for EDT at least ? it must be very fast because getThreadContext() is called once per rendering operation so it is a performance bottleneck.
For example: Testing file /home/bourgesl/libs/openjdk/mapbench/test/dc_shp_alllayers_2013-00-30-07-00-47.ser TL: 4 threads and 20 loops per thread, time: 67715 ms CLQ: 4 threads and 20 loops per thread, time: 75343 ms
Changes:
- use ThreadLocal or ConcurrentLinkedQueue to get a renderer context (vars / cache)
- use first RendererContext (dirty / clean arrays) members instead of using IntArrayCache / FloatArrayCache for performance reasons (dedicated to large dynamic arrays)
TBD:
- recycle pisces class i.e. keep only one instance per class (Renderer, Stroker ...) to avoid totally GC overhead (several thousands per MapBench test).
Moreover, these are very small objects / short lived i.e. l so it should stay in ThreadLocalAllocator (TLAB) but when I use verbose:gc or jmap -histo these are present and represents megabytes: [bourgesl at jmmc-laurent ~]$ jmap -histo:live 21628 | grep pisces 5: 50553 6470784 sun.java2d.pisces.Renderer 9: 29820 3578400 sun.java2d.pisces.Stroker 11: 49795 3186880 sun.java2d.pisces.PiscesCache 12: 49794 1991760 sun.java2d.pisces.PiscesTileGenerator 13: 49793 1991720 sun.java2d.pisces.Renderer$ScanlineIterator 14: 29820 1431360 sun.java2d.pisces.PiscesRenderingEngine$NormalizingPathIterator 52: 40 1280 sun.java2d.pisces.IntArrayCache 94: 20 640 sun.java2d.pisces.FloatArrayCache 121: 8 320 [Lsun.java2d.pisces.IntArrayCache; 127: 4 320 sun.java2d.pisces.RendererContext 134: 4 256 sun.java2d.pisces.Curve 154: 4 160 [Lsun.java2d.pisces.FloatArrayCache; 155: 4 160 sun.java2d.pisces.RendererContext$RendererData 156: 4 160 sun.java2d.pisces.RendererContext$StrokerData 157: 4 160 sun.java2d.pisces.Stroker$PolyStack 208: 3 72 sun.java2d.pisces.PiscesRenderingEngine$NormMode 256: 1 32 [Lsun.java2d.pisces.PiscesRenderingEngine$NormMode; 375: 1 16 sun.java2d.pisces.PiscesRenderingEngine 376: 1 16 sun.java2d.pisces.RendererContext$1
Regards, Laurent
2013/4/3 Laurent Bourgès <bourges.laurent at gmail.com>
Thanks for your valueable feedback!
Here is the current status of my patch alpha version:
http://jmmc.fr/~bourgesl/share/java2d-pisces/
There is still a lot to be done: clean-up, stats, pisces class instance recycling (renderer, stroker ...) and of course sizing correctly initial arrays (dirty or clean) in the RendererContext (thread local storage). For performance reasons, I am using now RendererContext members first (cache for rowAARLE for example) before using ArrayCaches (dynamic arrays).
Thank you Laurent, those are some nice speedups. I think it can still be improved: I hope to make it as fast as ductus or maybe more (I have several idea for aggressive optimizations) but the main improvement consist in reusing memory (like C / C++ does) to avoid wasted memory / GC overhead in concurrent environment. About the thread local storage, that is a sensible choice for highly concurrent systems, at the same time, web containers normally complain about orphaned thread locals created by an application and not cleaned up. Not sure if ones created at the core libs level get special treatment, but in general, I guess it would be nice to have some way to clean them up. You're right that's why my patch is not ready ! I chose ThreadLocal for simplicity and clarity but I see several issues: 1/ Web container: ThreadLocal must be clean up when stopping an application to avoid memory leaks (application becomes unloadable due to classloader leaks) 2/ ThreadLocal access is the fastest way to get the RendererContext as it does not require any lock (unsynchronized); As I get the RendererContext once per rendering request, I think the ThreadLocal can be replaced by a thread-safe ConcurrentLinkedQueue but it may become a performance bootleneck 3/ Using a ConcurrentLinkedQueue requires an efficient / proper cache eviction to free memory (Weak or Soft references ?) or using statistics (last usage timestamp, usage counts) Any other idea (core-libs) to have an efficient thread context in a web container ? I'm not familiar with the API, but is there any way to clean them up when the graphics2d gets disposed of? The RenderingEngine is instanciated by the JVM once and I do not see in the RenderingEngine interface any way to perform callbacks for warmup / cleanup ... nor access to the Graphics RenderingHints (other RFE for tuning purposes) A web application has no guarantee to see the same thread ever again during his life, so thread locals have to be cleaned right away. I advocate ThreadLocal can lead to wasted memory as only few concurrent threads can really use their RendererContext instance while others can simply answer web requests => let's use a ConcurrentLinkedQueue with a proper cache eviction. Either that, or see if there is any way to store the array caches in a global structure backed by a concurrent collection to reduce/eliminate contention. Yes, it is a interesting alternative to benchmark. Regards, Laurent
- Previous message: [OpenJDK 2D-Dev] sun.java2D.pisces big memory usage (waste ?)
- Next message: [OpenJDK 2D-Dev] sun.java2D.pisces big memory usage (waste ?)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]