[Python-Dev] When do sets shrink? (original) (raw)

Raymond Hettinger raymond.hettinger at verizon.net
Sun Jan 1 04:09:22 CET 2006


[Fernando Perez]

Note that this is not a comment on the current discussion per se, but rather a small request/idea in the docs department: I think it would be a really useful thing to have a summary page/table indicating the complexities of the various operations on all the builtin types, including at least mention of subtleties and semi-gotchas.

The wiki might be the place to cover this sort of thing. Unlike infrequent doc releases, the medium is good at being responsive to whatever someone thought important enough to write an entry for. Also, it is more easily kept up-to-date for variations between versions (Py2.4, Py2.5, etc.) and implementations (CPython, Jython, etc.).

The relevant list of these ideas may be somewhat short:

I think the number one performance gotcha is adopting a COBOL-like code writing mentality and failing to creatively use Python's powerful collection types: list, tuple, dict, set, str and an occasional array, deque, or cStringIO object.

For example, I had never realized that on dicts, for some O(N) operations, N would mean "largest N in the dict's history" instead of "current number of elements".

It might be better to give more generic advice that tends to be true across implementations and versions: "Dense collections like lists tuples iterate faster than sparse structures like dicts and sets. Whenever repeated iteration starts to dominate application run-time, consider converting to a dense representation for faster iteration and better memory/cache utilization." A statement like this remains true whether or not a down-sizing algorithm is present.

Cheers,

f

Hmm, your initial may be infringing on another developer's trademarked signature ;-)

Raymond

Side note: To some degree, ignorance is bliss. Most of my code was written in AWK and I was content to know only one non-algorithmic optimization ("exp" vs /exp/). Time was spent thinking about the problem at hand rather than how to outsmart the interpreter. Knowing too much about the implementation can be a distraction. Besides, when timing does become critical, people seem to suddenly become spontaneously ingenious.



More information about the Python-Dev mailing list