[Python-Dev] Dictionary sparseness (original) (raw)

Walter D�rwald walter@livinglogic.de
Mon, 12 May 2003 11:56:25 +0200


Tim Peters wrote:

[...] For symbol table apps, Bentley & Sedgewick rehabilitated the idea of ternary search trees a few years ago, and I think you'd enjoy reading their papers:

http://www.cs.princeton.edu/~rs/strings/ In particular, they're faster than hashing in the failing-lookup case.

The digital search tries mentioned in the article seem to use the same fundamental approach as state machines, i.e. while traversing the string, remember the string prefix that has already been recognized. Digital search tries traverse the tree and the memory is in the path that has been traversed. State machines traverse a transition table and the memory is the current state. Digital search tries seem to be easy to update, while state machine are not.

Has anybody tried state machines for symbol tables in Python? The size of the transition table might be a problem and any attempt to reduce the size might kill performance in the inner loop. Performancewise stringobject.c/string_hash() is hard to beat (especially when the hash value is already cached).

Bye, Walter D�rwald