[Python-ideas] dict.hash - optimized per module (original) (raw)
Antoine Pitrou solipsis at pitrou.net
Sun Oct 17 12:52:40 CEST 2010
- Previous message: [Python-ideas] dict.hash - optimized per module
- Next message: [Python-ideas] dict.hash - optimized per module
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, 17 Oct 2010 20:41:34 +1100 Steven D'Aprano <steve at pearwood.info> wrote:
On Sun, 17 Oct 2010 06:27:37 pm Jan Koprowski wrote:
> After watching I made graph, using presented at conference library > dictinfo, for builtin.dict. > When I saw few collisions I think "Why this module doesn't have > their own hashing function implementation which allow to avoid > collision in this set of names?". Python 2.6 has 143 builtin names, and zero collisions:
It depends what you call collisions. Collisions during bucket lookup, or during hash value comparison (that is, after you selected a bucket)?
For the former, here is the calculation assuming an overallocation factor of 4 (which, IIRC, is the one used in the dict implementation):
import builtins d = builtins.dict m = len(d) * 4 for name in d: ... h = hash(name) % m ... if h in hashes: print("Collision for", name) ... hashes.setdefault(h, []).append(name) ... Collision for True Collision for FutureWarning Collision for license Collision for KeyboardInterrupt Collision for UserWarning Collision for RuntimeError Collision for MemoryError Collision for Ellipsis Collision for UnicodeError Collision for Exception Collision for tuple Collision for delattr Collision for setattr Collision for ArithmeticError Collision for property Collision for KeyError Collision for PendingDeprecationWarning Collision for map Collision for AssertionError len(d) 130 len(hashes) 110
> My second think was "Why each > Python module doesn't have their own internal hashing function which > doesn't produce collisions in scope of his names".
The real answer here is that Python needs hash values to be globally valid. Both for semantics (module dicts are regular dicts and should be usable as such), and for efficiency (having an unique hash function means the precalculated hash value can be stored for critical types such as str).
Regards
Antoine.
- Previous message: [Python-ideas] dict.hash - optimized per module
- Next message: [Python-ideas] dict.hash - optimized per module
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]