Parallel ClassLoading space optimizations (original) (raw)

Peter Levart peter.levart at gmail.com
Mon Feb 4 08:42:57 UTC 2013

Previous message: WITHDRAWN Re: Proposal: Fully Concurrent ClassLoading
Next message: Parallel ClassLoading space optimizations
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi David,

I might have something usable, but I just wanted to verify some things beforehand. What I investigated was a simple keeping a cache of locks in a map weekly referenced. In your blog:

* https://blogs.oracle.com/dholmes/entry/parallel_classloading_revisited_fully_concurrent

...you describe this a a 3rd alternative:

/3. Reduce the lifetime of lock objects so that entries are removed
from the map when no longer needed (eg remove after loading,*use
weak references* to the lock objects and cleanup the map periodically)./

...but later you preclude this option:

/Similarly we might reason that we can remove a mapping (and the
lock object) because the class is already loaded, but this would
again violate the specification because it can be reasoned that the
following assertion should hold true: /

|
   Object lock1 = loader.getClassLoadingLock(name);
   loader.loadClass(name);
   Object lock2 = loader.getClassLoadingLock(name);
   assert lock1 == lock2;
|

/Without modifying the specification, or at least doing some
creative wordsmithing on it, options 1 and 3 are precluded. /

When using WeakReferences to cache lock Objects, the above assertion would still hold true, wouldn't it?

I can not think of any reasonable 3rd party ClassLoader code that would behave differently when having lock objects strongly referenced for the entire VM lifetime vs. having them temporarily weakly referenced and eventually recreated if needed. For example, only code that does the following things can see difference:

use .toString or .hashCode on lock object and keep it somewhere without also keeping the lock object itself to use it later
wrap a lock object into a WeakReference and observe reference being cleared or not

Is that a reasonable assumption to continue in this direction? If the semantics are reasonably OK, then all the solution has to prove is (space and time) performance, right?

Here's some preliminary illustration what can be achieved space-wise. This is a test that attempts to load all the classes from the rt.jar. The situation we have now (using -Xms256m -Xmx256m and 32bit addresses):

...At the beginning of main()

Total memory: 257294336 bytes Free memory: 251920320 bytes Deep size of sun.misc.Launcher$ExtClassLoader at 3d4eac69: 7936 bytes Deep size of sun.misc.Launcher$AppClassLoader at 55f96302: 30848 bytes Deep size of both: 38784 bytes (reference)

...Attempted to load: 18558 classes in: 1964.55825 ms

Total memory: 257294336 bytes Free memory: 227314112 bytes Deep size of sun.misc.Launcher$ExtClassLoader at 3d4eac69: 1162184 bytes Deep size of sun.misc.Launcher$AppClassLoader at 55f96302: 2215216 bytes Deep size of both: 3377400 bytes (difference to reference: 3338616 bytes)

...Performing gc()

...Loading class: test.TestClassLoader$Last (to trigger expunging)

Total memory: 260440064 bytes Free memory: 193163368 bytes Deep size of sun.misc.Launcher$ExtClassLoader at 3d4eac69: 1162328 bytes Deep size of sun.misc.Launcher$AppClassLoader at 55f96302: 2215408 bytes Deep size of both: 3377736 bytes (difference to reference: 3338952 bytes)

vs. having lock objects weekly referenced and doing expunging work at each request for a lock:

...At the beginning of main()

Total memory: 257294336 bytes Free memory: 251920320 bytes Deep size of sun.misc.Launcher$ExtClassLoader at 75b84c92: 9584 bytes Deep size of sun.misc.Launcher$AppClassLoader at 42a57993: 33960 bytes Deep size of both: 43544 bytes (reference) Lock stats... create: 108 return old: 0 replace: 0 expunge: 0

...Attempted to load: 18558 classes in: 2005.14628 ms

Total memory: 257294336 bytes Free memory: 187198776 bytes Deep size of sun.misc.Launcher$ExtClassLoader at 75b84c92: 572768 bytes Deep size of sun.misc.Launcher$AppClassLoader at 42a57993: 1122976 bytes Deep size of both: 1695744 bytes (difference to reference: 1652200 bytes) Lock stats... create: 37302 return old: 201 replace: 0 expunge: 25893

...Performing gc()

...Loading class: test.TestClassLoader$Last (to trigger expunging)

Total memory: 257294336 bytes Free memory: 238693336 bytes Deep size of sun.misc.Launcher$ExtClassLoader at 75b84c92: 78944 bytes Deep size of sun.misc.Launcher$AppClassLoader at 42a57993: 168512 bytes Deep size of both: 247456 bytes (difference to reference: 203912 bytes) Lock stats... create: 2 return old: 0 replace: 0 expunge: 11517

... as can be seen from this particular usecase, there's approx. 20% overhead of storage for locks because of WeakReference indirection (at the beginning of main() before any expunging kicks-in) and it seems there's a negligible overhead of about 2% in performance when considering total time of loading classes. After that we see that (since this is a single-threaded example) re-use of lock for a class that is already (being) loaded is rare (I assume only explicit requests like Class.forName trigger that event in this example). At the end, almost all locks are eventually released, which frees 3MB+ heap space.

Here's a piece of code for obtaining locks (coded as a subclass of ConcurrentHashMap for performance reasons):

 public Object getOrCreate(K key) {
     // the most common situation is that the key is new, so

optimize fast-path accordingly Object lock = new Object(); LockRef ref = new LockRef<>(key, lock, refQueue); expungeStaleEntries(); for (; ; ) { @SuppressWarnings("unchecked") LockRef oldRef = (LockRef) super.putIfAbsent(key, ref); if (oldRef == null) { if (keepStats) createCount.increment(); return lock; } else { Object oldLock = oldRef.get(); if (oldLock != null) { if (keepStats) returnOldCount.increment(); return oldLock; } else if (super.replace(key, oldRef, ref)) { if (keepStats) replaceCount.increment(); return lock; } } } }

 private void expungeStaleEntries() {
     LockRef<K> ref;
     while ((ref = (LockRef<K>) refQueue.poll()) != null) {
         super.remove(ref.key, ref);
         if (keepStats) expungeCount.increment();
     }
 }

Do you think this is something worth pursuing further?

Regards, Peter

On 02/01/2013 05:01 AM, David Holmes wrote:

Hi Peter,

On 31/01/2013 11:07 PM, Peter Levart wrote: Hi David,

Could the parallel classloading be at least space optimized somehow in the JDK8 timeframe if there was a solution ready? If there is something that does not impact any of the existing specified semantics regarding the classloader lock object then it may be possible to work it into an 8 update if not 8 itself. But all the suggestions I've seen for reducing the memory usage also alter the semantics in someway. However, a key part of the concurrent classloader proposal was that it didn't change the behaviour of any existing classloaders outside the core JDK. Anything that changes existing behaviour has a much higher compatibility bar to get over. David -----

Previous message: WITHDRAWN Re: Proposal: Fully Concurrent ClassLoading
Next message: Parallel ClassLoading space optimizations
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the core-libs-dev mailing list