[Python-Dev] iterzip() (original) (raw)

Jeremy Hylton jeremy@zope.com
Mon, 29 Apr 2002 18:31:57 -0400


"GvR" == Guido van Rossum <guido@python.org> writes:

I was imagining a scheme like this: Count increfs and decrefs. Set two thresholds. A collection occurs when both thresholds are exceeded. Perhaps 100 decrefs and 1000 increfs.

GvR> I expect you can't literally count increfs and decrefs. These GvR> are macros that need to be super-fast, and I think we can't GvR> really afford to increment a counter on eacn macro invocation.

GvR> The current thresholds are used to count the number of GvR> allocations.

I was being sloppy. I meant allocations and deallactions.

GvR> Adding the number of deallocations to mix seems dangerous: when GvR> (nearly) all data is tied up in cycles, there may not be any GvR> deallocations.

Probably right, although function calls should provide a steady stream of deallocations. Frame, locals, &c. are deallocated on exit. So unless the code is blocked waiting for those cycles to be collected, it ought to eventually make progress.

GvR> It seems hard to distinguish between these two cases:

GvR> (a) lots of memory is allocated and kept alive for real by GvR> containers

GvR> (b) lots of memory is allocated and kept alive accidentally by GvR> cycles

GvR> The zip example is a case of (a), but the same allocation GvR> behavior could ensue from case (b). Only running the allocator GvR> can determine which case we're seeing. I like Tim's idea of GvR> adjusting the thresholds base upon the effectiveness of recent GvR> collections.

I agree that this sounds interesting.

How does this come into play in the benchmark in question? It seems like we should have gotten a lot of quick collections, but it was still quite slow.

GvR> The benchmark has a list with a million elements, and during GvR> execution more and more tuples are added to it. I expect that GvR> somehow all these tuples are visited every 700 allocations, GvR> which gives quadratic behavior. I guess the generational part GvR> of the collector doesn't affect this --

I guess this is a question for Neil. I assumed that the generational part would affect this.

GvR> the only way to reduce the work would be if the big list GvR> somehow was skipped once it was known to be "old". But since GvR> it contains references to "new" object (the 700 tuples GvR> allocated last), it probably ends up being visited anyway.

I thought something was labelled old after it survived some number of collections. Why would the age of the objects it contains have any affect?

Jeremy