[Python-Dev] iterzip() (original) (raw)

Tim Peters tim.one@comcast.net
Mon, 29 Apr 2002 22:36:12 -0400


[Neil Schemenauer]

Adding a fourth generation drops the time from 5.13 to 2.11 on my machine. Adding a fifth doesn't seem to make a difference. I used 10 as the threshold for both new generations.

Alas, these thresholds are a little hard to work with. For example

  ...
else if (collections0 > threshold1) {
    ...
    collections1++;
    /* merge gen0 into gen1 and collect gen1 */
    ...
    collections0 = 0;
}
else {
    generation = 0;
    collections0++;
    ... /* collect gen0 */ ...
}

Let's say threshold1 is 10 (because it is ), and we just finished a gen1 collection. Then collections0 is 0. We have to do 11 gen0 collections then before "collections0 > threshold1" succeeds, and that point is actually the 12th time gen0 has filled up since the last time we did a gen1 collection.

Similarly for collections1 vs threshold2.

This makes it hard to multiply them out in an obvious way .

Anyway, with 4 generations it takes in the ballpark of 700 * 10 * 10 * 10 = 700,000 excess allocations before a gen3 collection is triggered, so I expect you saw exactly one gen3 collection during the lifetime of the test run (there are about 1,000,000 excess allocations during its run). Also that adding a fifth generation wouldn't matter at all in this test, since you'd still see exactly one gen3 collection, and a gen4 collection would never happen.

Now another ballpark: On the only machine that matters in real life (mine), I'm limited to 231 bytes of user address space, and an object participating in gc can rarely be smaller than 40 bytes. That means I can't have more than 231/40 ~= 55 million gcable objects alive at once, and that also bounds the aggregate excess of allocations over deallocations. That surprised me. It means the "one million tuple" test is already taxing a non-trivial percentage of this box's theoretical capacity. Indeed, I tried boosting it to 10 million, and after glorious endless minutes of listening to the disk grind itself to dust (with gc disabled, even), Win98 rebooted itself.

So another factor-of-10 generation or two would probably move the gross surprises here out of the realm of practical concern. Except, of course, for the programs where it wouldn't .