[Python-Dev] PEP 442: Safe object finalization (original) (raw)

Armin Rigo arigo at tunes.org
Sat May 18 15:24:08 CEST 2013


Hi Antoine,

On Sat, May 18, 2013 at 10:59 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

Cyclic isolate (CI) A reference cycle in which no object is referenced from outside the cycle and whose objects are still in a usable, non-broken state: they can access each other from their respective finalizers.

Does this definition include more complicated cases? For example:

A -> B -> A    and   A -> C -> A

Neither cycle is isolated. If there is no reference from outside, then the set of all three objects is isolated, but isn't strictly a cycle. I think the term is "strongly connected component".

1. Weakrefs to CI objects are cleared, and their callbacks called. At this point, the objects are still safe to use.

2. The finalizers of all CI objects are called.

You need to be very careful about what each call to a finalizer can do to the object graph. It may already be what you're doing, but the most careful solution is to collect in "1." the complete list of objects with finalizers that are in cycles; then incref them all; then call the finalizer of each of them; then decref them all. Such a solution gives new cases to think about, which are slightly unexpected for CPython's model: for example, if you have a cycle A -> B -> A, let's say the GC calls A.del first; it might cause it to store a reference to B somewhere else, e.g. in some global; but then the GC calls B.del anyway. This is probably fine but should be considered.

3. **The CI is traversed again to determine if it is still isolated.

How is this done? I don't see a clear way to determine it by looking only at the objects in the CI, given that arbitrary modifications of the object graph may have occurred. The solution I can think of doesn't seem robust against minor changes done by the finalizer. Take the example "A -> lst -> B -> A", where the reference from A to B is via a list (e.g. there is an attribute "A.attr = [B]"). If A.del does the seemingly innocent change of replacing the list with a copy of itself, e.g. "A.attr = A.attr[:]", then after the finalizers are called, "lst" is gone and we're left with "A -> lst2 -> B -> A". Checking that this cycle is still isolated requires a possibly large number of checks, as far as I can tell. This can lead to O(n**2) behavior if there are n objects in total and O(n) cycles.

The solution seems to be to simply wait for the next GC execution. Assuming that a finalizer is only called once, this only delays a bit freeing objects with finalizers in cycles (but your PEP still works to call finalizers and eventually collect the objects). Alternatively, this might be done immediately: in the point "3." above we can forget everything we found so far, and redo the tracking on all objects (this time ignoring finalizers that were already called). In fact, it may be necessary anyway: anything found before might be invalid after the finalizers are called, so forgetting it all and redoing the tracking from scratch seems to be the only way.

Type objects get a new tpfinalize slot to which _del_ methods are bound. Generators are also modified to use this slot, rather than tpdel. At the C level, a tpfinalize function is a normal function which will be called with a regular, alive object as its only argument. It should not attempt to revive or collect the object.

Do you mean the opposite in the latest sentence? tp_finalize can do anything...

A bientôt,

Armin.



More information about the Python-Dev mailing list