[Python-Dev] deque alternative (was: Linked lists) (original) (raw)

Tim Peters tim.peters at gmail.com
Mon Dec 26 18:56:27 CET 2005


[restoring context and attributions lost in the original]

[Tim Peters]

Like Phillip Eby, I use 2-tuples for this when I feel the need (usually during a backtracking graph search, to keep track of paths back to the root in a space-efficient way), and am happy with that.

[Josiah Carlson]

Then there's the whole Python list with append() and pop(). It takes a method resolution and function call, but at least in Python 2.3, it is a hair faster...

[Martin Blais]

Josiah, that's beside the point, because it is not space-efficient, the list is actually an dynamic array/vector, and I would expect an efficient array implementation to grow exponentially. One of the reasons we're talking about lists is to avoid exactly this problem...

[James Y Knight]

Okay, now that reason is a pretty crazy one. Consider what you're saying here.

I'm sure he did ;-) Consider a very simple graph, a skinny tree rooted at A that branches once "at the end", represented by a dict of adjacency lists:

kids[A] = [B] kids[B] = [C] kids[C] = [D] ... kids[W] = [X] kids[X] = [Y, Z]

A natural representation of the path from B back to the root is:

Bpath = B, (A, None)

and from C back to the root:

Cpath = C, Bpath

and from D back to the root:

Dpath = D, Cpath

...

and from X back to the root:

Xpath = X, Wpath

A "flat" list or tuple would certainly be more space-efficient up to this point. But when the graph branches, the 2-tuple representation allows sharing the common path suffix (which may be very long!):

Ypath = Y, Xpath

and Zpath = Z, Xpath

If there's an N-way branch, using 2-tuples allows recording the N new paths back to the root with incremental memory burden N * some_constant. You can't do this kind of sharing with a flat list or tuple, and the incremental memory burden at an N-way branch zooms to (N + some_other_constant) * (the amount of memory needed to record the path from the branch point back to the root), due to duplicating the back-to-the-root info N times. The potential memory saving from using 2-tuples instead is unbounded.

...

Plus, in python, the overhead per object in the pyobject memory allocation system, which I don't know how to quantify.

For objects requiring a number of bytes divisible by 8, a few percent at worst. The per-object space overhead in obmalloc consists entirely of internal padding needed to reach a multiple of 8 bytes per allocation unit, and that's 0 when the # of bytes needed is divisible by 8. The only obmalloc space overheads then are due to per-pool and per-arena bookkeeping space, and those are small fractions of total pool and arena sizes.

...

So, the list will generally be 1/8th of its size overallocated, or 112 elements plus 9 words overhead is 121 words. Better than any cons- linked list could be, and insanely better than a cons-linked list would be in python.

You seem to be ignoring possiblities for sharing across lists, and such sharing is natural in many graph algorithms.



More information about the Python-Dev mailing list