[Python-Dev] Dictionary sparseness (original) (raw)
Jack Diederich jack@performancedrivers.com
Mon, 12 May 2003 02:16:57 -0400
- Previous message: [Python-Dev] Dictionary sparseness
- Next message: [Python-Dev] Dictionary sparseness
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, May 11, 2003 at 09:47:33PM -0400, Tim Peters wrote:
[Agthorr] > An alternate optimization would be the additional of an immutable > dictionary type to the language, initialized from a mutable dictionary > type. Upon creation, this dictionary would optimize itself, in a > manner similar to "gperf" program which creates (nearly) minimal > zero-collision hash tables.
For symbol table apps, Bentley & Sedgewick rehabilitated the idea of ternary search trees a few years ago, and I think you'd enjoy reading their papers: http://www.cs.princeton.edu/~rs/strings/ In particular, they're faster than hashing in the failing-lookup case.
They nest well too. And you can do some caching if the higher level trees are unchaning (local scope can shortcut into builtins).
I have a pure-python ternary tree and a C w/python wrappers of ternary trees lying around. They were written with symbol tables is mind, I haven't touched em since my presentation proposal on the topic [ternary trees in general, replacing python symbol dict w/ t-trees as the closing example] was declined for the Portland OReilly thingy (bruised ego, sour grapes, et al).
Cut-n-paste from an off-list for this undying thread below. Hettinger's idea of treaps is a good one. A ternary-treap would also be possible.
-jack
[Raymond]
My thought is to use a treap. The binary search side would scan the hash values while the heap part would organize from most frequent to least frequently accessed key. It could even be dynamic and re-arrange the heap according to usage patterns.
[me] treaps would probably be a better fit than ternary trees, espcially for builtins for the reasons you mention. A good default ordering would go a long way.
[me, about ternary trees] They nest nicely, a valid 'next' node can be another ternary tree, so pseudo code for import would be
newmodule = import('mymodule')
assume module_symdict is the module's symbol table
module_symdict['mymodule.'] = newmodule.module_symdict
a lookup for 'mymodule.some_function' would happily run from the current module's tree into the 'mymodule' tree. The '.' seperator would only remain speical from a user's point of view.
If symbols don't share leading characters, ternary trees are just binary trees that require additional bookkeeping. This is probably the case, so ternary trees become less neat [even if they do make for prettier pictures].
- Previous message: [Python-Dev] Dictionary sparseness
- Next message: [Python-Dev] Dictionary sparseness
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]