[Python-Dev] Proposal: Run GC less often (original) (raw)
Leif Walsh adlaiff6 at gmail.com
Sat Jun 21 23:33:15 CEST 2008
- Previous message: [Python-Dev] Proposal: Run GC less often
- Next message: [Python-Dev] Proposal: Run GC less often
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
If you can get me a version of the interpreter with this change made (I wouldn't know what to change), I can run a very allocation/deallocation-heavy application I have lying around, and get you some good benchmarks.
On Sat, Jun 21, 2008 at 1:23 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
- Show quoted text -
Here is my proposal for making the GC run less often. The objective is to avoid the quadratic-time behavior if many objects are created and none of them is garbage.
The youngest generation remains collected after 700 allocations, the middle generation after 10 young collections. Therefore, full GC remains considered every 7000 allocations. The two youngest generations don't cause quadratic behavior, as the number of objects in each generation is bounded. Currently, full GC is invoked every 10 middle collections. Under my proposal, 10 middle collections must have passed, PLUS the number of survivor objects from the middle generation must exceed 10% of the number of objects in the oldest generation. As a consequence, garbage collection becomes less frequent as the number of objects on the heap grows, resulting in an amortized O(N) overhead for garbage collection (under the access pattern stated above). To determine the number of survivor objects, counts are taken during the collection. Objects deallocated through reference counting still remain accounted for as survivors (likewise for determining the size of the oldest generation). I made a simulation assuming an initially-empty heap, and invocations of the collector every M=7000 objects. The table below is in units of M and shows the number of objects allocated, and the number of objects inspected since the start of the simulation, for the old and the new scheme (the asterisk indicates that a collection took place; lines with no collection are dropped): total oldinspected newinspected 10 10* 10* 20 30* 30* 30 60* 60* 40 100* 100* 50 150* 150* 60 210* 210* 70 280* 280* 80 360* 360* 90 450* 450* 100 550* 450 102 550 552* 110 660* 552 115 660 667* 120 780* 667 129 780 796* 130 910* 796 140 1050* 796 ... 940 44650* 7724 942 44650 8666* 950 45600* 8666 960 46560* 8666 970 47530* 8666 980 48510* 8666 990 49500* 8666 1000 50500* 8666 ... 9990 4995000* 95887 As you can see, the old and the new scheme would start of equally until 100M objects have been allocated. The table shows how the old scheme grows quadratically, and the new scheme eventually approaches roughly a factor of ten between the number of objects and the number of times that GC considers an object. Applications with a small number of objects will see no change in behavior, also, applications that create lots of small cycles will continue to see them collected in the youngest or middle generations. Please let me know what you think. Regards, Martin P.S. I don't plan to implement this myself, so if you like the idea, go ahead.
Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/adlaiff6%40gmail.com
-- Cheers, Leif
- Previous message: [Python-Dev] Proposal: Run GC less often
- Next message: [Python-Dev] Proposal: Run GC less often
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]