[Python-Dev] Garbage collecting closures (original) (raw)

Tim Peters tim_one@email.msn.com
Wed, 16 Apr 2003 00:51:27 -0400

Previous message: [Python-Dev] Garbage collecting closures
Next message: [Python-Dev] Garbage collecting closures
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This is a multi-part message in MIME format.

------=_NextPart_000_0006_01C303B2.4FA6CF60 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit

[Guido]

I'm glazing over the details now, but there seems to be a kernel of useful cleanup in here somehow; I hope that someone will be able to contribute a prototype of such code at least!

I'll attach a head start, a general implementation of Tarjan's SCC algorithm that produces a list of SCCs already in a topsort order. I haven't tested this enough, and Tarjan's algorithm is subtle -- user beware.

The trygc() function at the end is an example application that appears to work, busting all the objects gc knows about into SCCs and displaying them. This requires Python CVS (for the new gc.get_referents function). Note that you'll get a very large SCC at the start. This isn't an error! Each module that imports sys ends up in this SCC, due to that the module has the module sys in its module dict, and sys has the module in its sys.modules dict.

From there, modules have their top-level functions in their dict, while the top level functions point back to the module dict via func_globals. Etc. Everything in this giant blob is reachable from everything else.

For the gc application, it would probably be better (run faster and consume less memory) if dfs() simply ignored objects with no successors. Correctness shouldn't be harmed if def started with

succs = successors(v)
if not succs:
    return

except that objects with no successors would no longer be considered singleton SCCs, and the recursive call to dfs() would need to be fiddled to skip trying to update id2lowest[v_id] then (so dfs should be changed to return a bool saying whether it took the early return). This would save the current work of trying to chase pointless things like ints and strings. Still, it's pretty zippy as-is!

------=_NextPart_000_0006_01C303B2.4FA6CF60 Content-Type: text/plain; name="scc.py" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="scc.py"

This implements Tarjan's linear-time algorithm for finding the maximal

strongly connected components. It takes time proportional to the sum

of the number of nodes and arcs.

Two functions must be passed to the constructor:

node2id graph node -> a unique integer

successors graph node -> sequence of immediate successor graph =

nodes #

Call method getsccs() with an iterable producing the root nodes of the =

graph.

The result is a list of SCCs, each of which is a list of graph nodes.

This is a partitioning of all graph nodes reachable from the roots,

where each SCC is a maximal subset such that each node in an SCC is

reachable from all other nodes in the SCC. Note that the derived =

graph

where each SCC is a single "supernode" is necessarily acyclic (else if

SCC1 and SCC2 were in a cycle, each node in SCC1 would be reachable =

from

each node in SCC1 and SCC2, contradicting that SCC1 is a maximal =

subset).

The list of SCCs returned by getsccs() is in a topological sort order =

wrt

this derived DAG.

class SCC(object): def init(self, node2id, successors): self.node2id =3D node2id self.successors =3D successors

def getsccs(self, roots):
    import sys

    node2id, successors =3D self.node2id, self.successors
    get_dfsnum =3D iter(xrange(sys.maxint)).next
    id2dfsnum =3D {}
    id2lowest =3D {}
    stack =3D []
    id2stacki =3D {}
    sccs =3D []

    def dfs(v, v_id):
        id2dfsnum[v_id] =3D id2lowest[v_id] =3D v_dfsnum =3D =

get_dfsnum() id2stacki[v_id] =3D len(stack) stack.append((v, v_id)) for w in successors(v): w_id =3D node2id(w) if w_id not in id2dfsnum: # first time we saw w dfs(w, w_id) id2lowest[v_id] =3D min(id2lowest[v_id], = id2lowest[w_id]) else: w_dfsnum =3D id2dfsnum[w_id] if w_dfsnum < v_dfsnum and w_id in id2stacki: id2lowest[v_id] =3D min(id2lowest[v_id], = w_dfsnum)

        if id2lowest[v_id] =3D=3D v_dfsnum:
            i =3D id2stacki[v_id]
            scc =3D []
            for w, w_id in stack[i:]:
                del id2stacki[w_id]
                scc.append(w)
            del stack[i:]
            sccs.append(scc)

    for v in roots:
        v_id =3D node2id(v)
        if v_id not in id2dfsnum:
            dfs(v, v_id)
    sccs.reverse()
    return sccs

_basic_tests =3D """

succs =3D {1: [2], 2: []} s =3D SCC(int, lambda i: succs[i])

The order in which the roots are listed doesn't matter: we get the = unique topsort regardless.

s.getsccs([1]) [[1], [2]] s.getsccs([1, 2]) [[1], [2]] s.getsccs([2, 1]) [[1], [2]]

But note that 1 isn't reachable from 2, so giving 2 as the only root = won't find 1.

s.getsccs([2]) [[2]]

succs =3D {1: [2], ... 2: [3, 5], ... 3: [2, 4], ... 4: [3], ... 5: [2]} s =3D SCC(int, lambda i: succs[i]) s.getsccs([1]) [[1], [2, 3, 4, 5]] s.getsccs(range(1, 6)) [[1], [2, 3, 4, 5]]

Break the link from 4 back to 2.

succs[4] =3D [] s.getsccs([1]) [[1], [2, 3, 5], [4]] """

test =3D {'basic': _basic_tests}

def _test(): import doctest doctest.testmod()

if name =3D=3D 'main': _test()

def trygc(): import gc gc.collect() s =3D SCC(id, gc.get_referents) for scc in s.getsccs(gc.get_objects()): if len(scc) =3D=3D 1: continue print "SCC w/", len(scc), "objects" for x in scc: print " ", hex(id(x)), type(x), if hasattr(x, "name"): print x.name, print

------=_NextPart_000_0006_01C303B2.4FA6CF60--

Previous message: [Python-Dev] Garbage collecting closures
Next message: [Python-Dev] Garbage collecting closures
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]