msg354020 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2019-10-05 17:31 |
While people are thinking about gc, zleak.py shows a small bug, and a possible opportunity for improvement, in the way gc treats finalizers that resurrect objects. The bug: the stats keep claiming gc is collecting an enormous number of objects, but in fact it's not collecting any. Objects in the unreachable set shouldn't add to the "collected" count unless they _are_ collected. Output: resurrecting collect 2000002 gen 2 stats {'collections': 2, 'collected': 2000002, 'uncollectable': 0} resurrecting collect 4000004 gen 2 stats {'collections': 3, 'collected': 6000006, 'uncollectable': 0} resurrecting collect 6000006 gen 2 stats {'collections': 4, 'collected': 12000012, 'uncollectable': 0} resurrecting collect 8000008 gen 2 stats {'collections': 5, 'collected': 20000020, 'uncollectable': 0} resurrecting collect 10000010 gen 2 stats {'collections': 6, 'collected': 30000030, 'uncollectable': 0} ... Memory use grows without bound, and collections take ever longer. The opportunity: if any finalizer resurrects anything, gc gives up. But the process of computing whether anything was resurrected also determines which initially-trash objects are reachable from the risen dead. Offhand I don't see why we couldn't proceed collecting what remains trash. Then zleak.py would reclaim everything instead of nothing. Sketch: rather than just set a flag, check_garbage() could move now-reachable objects to the old generation (and, for each one moved, decrement the count of collected objects). Then delete_garbage() could proceed on what remains in the unreachable list. |
|
|
msg354041 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2019-10-06 18:33 |
Just noting that check_garbage() currently only determines which trash objects are now directly reachable from outside. To be usable for the intended purpose, it would need to go on to compute which trash objects are reachable from those too. Maybe a new function static void deduce_unreachable(PyGC_Head *base, PyGC_Head *unreachable) that packaged the steps through move_unreachable(). The main function would use that on `young`, and check_garbage() on `unreachable` to find the _still_ unreachable objects. I don't care about the expense. Outside of zleak.py, the number of unreachable objects is usually small, so it's usually cheap to make another pass over just them. |
|
|
msg354213 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2019-10-08 14:28 |
PR 16658 aims to repair the stats reported. |
|
|
msg354290 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2019-10-09 17:37 |
New changeset ecbf35f9335b0420cb8adfda6f299d6747a16515 by Tim Peters in branch 'master': bpo-38379: don't claim objects are collected when they aren't (#16658) https://github.com/python/cpython/commit/ecbf35f9335b0420cb8adfda6f299d6747a16515 |
|
|
msg354291 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2019-10-09 18:08 |
I checked the stat fix into master, but GH failed to backport to 3.7 or 3.8 and I'm clueless. More info in the PR. Does someone else here know how to get a backport done? |
|
|
msg354296 - (view) |
Author: Pablo Galindo Salgado (pablogsal) *  |
Date: 2019-10-09 21:14 |
Tim, I have created backports for 3.8 and 3.7 (PR 16683, PR 16685). In my case cherry_picker ecbf35f9335b0420cb8adfda6f299d6747a16515 3.7 and cherry_picker ecbf35f9335b0420cb8adfda6f299d6747a16515 3.8 works after fixing merge conflicts. Maybe there is something going on in your git repo (maybe you didn't fetch from master first and that is why git log fails?). |
|
|
msg354297 - (view) |
Author: miss-islington (miss-islington) |
Date: 2019-10-09 21:25 |
New changeset 0bd9fac7a866ec886ae8f93f9c24dcda6d436929 by Miss Islington (bot) (Pablo Galindo) in branch '3.8': [3.8] bpo-38379: don't claim objects are collected when they aren't (GH-16658) (GH-16683) https://github.com/python/cpython/commit/0bd9fac7a866ec886ae8f93f9c24dcda6d436929 |
|
|
msg354299 - (view) |
Author: miss-islington (miss-islington) |
Date: 2019-10-09 21:42 |
New changeset a081e931505f190b5ccdff9a781e59b3f13fcc31 by Miss Islington (bot) (Pablo Galindo) in branch '3.7': [3.7] bpo-38379: don't claim objects are collected when they aren't (GH-16658) (GH-16685) https://github.com/python/cpython/commit/a081e931505f190b5ccdff9a781e59b3f13fcc31 |
|
|
msg354589 - (view) |
Author: Pablo Galindo Salgado (pablogsal) *  |
Date: 2019-10-13 15:49 |
New changeset 466326dcdf038b948d94302c315be407c73e60d1 by Pablo Galindo in branch 'master': bpo-38379: Don't block collection of unreachable objects when some objects resurrect (GH-16687) https://github.com/python/cpython/commit/466326dcdf038b948d94302c315be407c73e60d1 |
|
|
msg354605 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2019-10-13 22:17 |
Everything here has been addressed, so closing this. zleak.py can apparently run forever now without leaking a byte :-) |
|
|
msg375518 - (view) |
Author: Lewis Gaul (LewisGaul) * |
Date: 2020-08-16 16:49 |
I noticed this bug is mentioned in the 3.9 release notes with a note similar to the title of the 4th PR: "garbage collection does not block on resurrected objects". I can't see any mention of a blocking issue here on the issue: > The bug: the stats keep claiming gc is collecting an enormous number of objects, but in fact it's not collecting any. Objects in the unreachable set shouldn't add to the "collected" count unless they _are_ collected. Would someone be able to elaborate on the blocking issue that was fixed as part of this BPO? |
|
|
msg375520 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2020-08-16 19:56 |
I suspect you're reading some specific technical meaning into the word "block" that the PR and release note didn't intend by their informal use of the word. But I'm unclear on what technical meaning you have in mind. Before the change, gc "just gave up" after seeing a resurrection, ending the then-current cyclic gc run. It that sense, yes, resurrection "blocked" gc from making progress. It did not, e.g., "block" the interpreter in the sense of deadlock, or of waiting for some lock to be released, or of waiting for a network request to respond, or ... |
|
|
msg375521 - (view) |
Author: Lewis Gaul (LewisGaul) * |
Date: 2020-08-16 20:02 |
You're right that's how I had interpreted it, thanks for clarifying. I was wondering if this could be related to an issue I've hit with gc.collect() getting slower and slower in a test suite, but that now seems unlikely, so I won't go into that here. |
|
|