[Python-Dev] Billions of gc's (original) (raw)

Tim Peters tim.one@comcast.net
Mon, 29 Apr 2002 23:55:41 -0400


[Aahz]

My take is that programs with a million live objects and no cycles are common enough that gc should be designed to handle that smoothly.

Well, millions of live objects is common but isn't a problem. The glitch we're looking at it is surprising slowdown with millions of live container objects. The latter isn't so common.

I don't think that a programmer casually writing such applications (say, processing information from a database) should be expected to understand gc well enough to tune it.

People casually writing applications pushing the limits of their boxes are in for more surprises than just this .

Having read the entire discussion so far, and NOT being any kind of gc expert, I would say that Tim's adaptive solution makes the most sense to me. For years, we told people with cyclic data to figure out how to fix the problem themselves; now that we have gc available, I don't think we should punish everyone else.

We're not trying to punish anyone, but innocent users with lots of containers can lose big despite our wishes: if we don't check them for cycles, they can run out of memory; if we do check them for cycles, it necessarily consumes time.

As a datapoint, here are the times (in seconds) for justzip() on my box after my checkin to precompute the result size (list.append behavior is irrelevant now):

gc disabled: 0.64 gc enabled: 7.32 magic=2(): 2.63 magic=3(): 2.02

(*) This is gcmodule.c fiddled to add this block after "collections1 = 0;" in the first branch of collect_generations():

    if (n == 0)
        threshold2 *= magic;
    else if (threshold2 > 5)
        threshold2 /= magic;

magic=1 is equivalent to the current code. That's all an "adaptive scheme" need amount to, provided the "*=" part were fiddled to prevent threshold2 from becoming insanely large. Boosting magic above 3 didn't do any more good in this test.

At magic=3 it still takes 3+ times longer than with gc disabled, but that's a whale of a lot better than the current 11+ times longer. Note that with gc disabled, any cycle in any of the 1,000,001 containers this test creates would leak forever -- casual users definitely get something back for the time spent.