type-recursive inlining oddity (original) (raw)

Aleksey Shipilev aleksey.shipilev at oracle.com
Mon Sep 10 05:08:16 PDT 2012


Hi there,

We are in the middle of performance analysis for lambdas/bulk operations coming in JDK8, and stumbled across understandable, but weird oddity in our inlining policy.

In short, there are cases when we delegate the method call to the same method of the same actual class. I.e.:

public static class DelegatingIterator1<T> implements Iterator<T> {
    private final Iterator<T> i;

    public DelegatingIterator1(Iterator<T> i) {
        this.i = i;
    }

    @Override
    public boolean hasNext() {
        return i.hasNext();
    }

    @Override
    public T next() {
        return i.next();
    }

    @Override
    public void remove() {
        i.remove();
    }
}

This has somewhat surprising effect when you stack the delegates. Basically, if you have the exact copy of DelegatingIterator1 as DelegatingIterator2, then these three cases;

empty = Collections.emptyIterator(); chain11 = new DelegatingIterator1<>(new DelegatingIterator1<>(empty)); chain12 = new DelegatingIterator1<>(new DelegatingIterator2<>(empty));

empty.hasNext(); // repeatedly call this, and measure

chain11.hasNext(); // repeatedly call this, and measure chain12.hasNext(); // repeatedly call this, and measure

...have drastically different performance.

With proper warmup, cpufreq, etc. etc. on my 2.0 Ghz i5-2520M, Linux x86, JDK 7u4: empty: 1.511 +- 0.003 nsec/op chain11: 4.055 +- 0.060 nsec/op chain12: 2.545 +- 0.135 nsec/op

This difference could easily be explained by this inlining tree:

GeneratedRecursiveBench::chain11 @ 6 @ 8 RecursiveBench::chain11 inline (hot) @ 4 RecursiveBench$DelegatingIterator1::hasNext inline (hot) @ 4 java.util.Collections$EmptyIterator::hasNext inline (hot) @ 4 RecursiveBench$DelegatingIterator1::hasNext inline (hot) @ 4 java.util.Collections$EmptyIterator::hasNext inline (hot) @ 4 RecursiveBench$DelegatingIterator1::hasNext recursively inlining too deep

...as compared with:

GeneratedRecursiveBench::chain12 @ 6 @ 8 RecursiveBench::chain12 inline (hot) @ 4 RecursiveBench$DelegatingIterator1::hasNext inline (hot) @ 4 RecursiveBench$DelegatingIterator2::hasNext inline (hot) @ 4 java.util.Collections$EmptyIterator::hasNext inline (hot)

What had happened in chain11? We compile the v-call in DelegatingIterator1.hasNext(); it appears that the receivers in i.hasNext() are either DelegatingIterator1 or EmptyIterator; so we devirtualize this bimorphic call, inline both; then, we proceed with processing v-calls in new inlined methods, and voila, there is the v-call for i.hasNext() in the same method again, so we devirtualize and inline further, until MaxRecursiveInlineLevel tells us to stop. Twiddling MaxRecursiveInlineLevel has no effect on performance (except for setting in to 0, when chain11 drops even further).

Note that it is not the "true" "control" recursion, when I pass the control to the same instance method in the same instance; it is rather the "type" recursion when I call the same instance method in the same class.

All this actually tells me, Hotspot does not account the position in the call tree when recording profiles. Should it do that, we could actually differentiate the v-calls made in first instance vs. v-calls made in second instance. Is there a known/existing explanation whether it is feasible to do? Or, is there another solution for this problem, except for splitting the types?

-Aleksey.



More information about the hotspot-compiler-dev mailing list