[Python-ideas] dict.hash - optimized per module (original) (raw)

Steven D'Aprano steve at pearwood.info
Sun Oct 17 11:41:34 CEST 2010

Previous message: [Python-ideas] dict.hash - optimized per module
Next message: [Python-ideas] dict.hash - optimized per module
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, 17 Oct 2010 06:27:37 pm Jan Koprowski wrote:

After watching I made graph, using presented at conference library dictinfo, for builtin.dict. When I saw few collisions I think "Why this module doesn't have their own hashing function implementation which allow to avoid collision in this set of names?".

Python 2.6 has 143 builtin names, and zero collisions:

hashes = {} import builtin for name in builtin.dict: ... h = hash(name) ... if h in hashes: print "Collision for", name ... L = hashes.setdefault(h, []) ... L.append(name) ... len(hashes) 143 filter(lambda x: len(x) > 1, hashes.values()) [] next(hashes.iteritems()) (29257728, ['bytearray'])

My second think was "Why each Python module doesn't have their own internal hashing function which doesn't produce collisions in scope of his names".

Firstly, the occasional collision doesn't matter much.

Secondly, your idea would mean that every module would need it's own custom-made hash function. Writing good hash functions is hard. The Python hash function is very, very good. Expecting developers to produce dozens of hash functions equally as good is totally impractical.

-- Steven D'Aprano

Previous message: [Python-ideas] dict.hash - optimized per module
Next message: [Python-ideas] dict.hash - optimized per module
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-ideas mailing list