[Python-Dev] Hashing proposal: change only string-only dicts (original) (raw)
Antoine Pitrou solipsis at pitrou.net
Tue Jan 17 22:26:11 CET 2012
- Previous message: [Python-Dev] Hashing proposal: change only string-only dicts
- Next message: [Python-Dev] Hashing proposal: change only string-only dicts
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 17 Jan 2012 21:59:28 +0100 "Martin v. Löwis" <martin at v.loewis.de> wrote:
I'd like to propose a different approach to seeding the string hashes: only do so for dictionaries involving only strings, and leave the tphash slot of strings unchanged.
I think Python 3 would be better with a clean fix (all hashes randomized). Now for Python 2... The problem with this idea is that it only addresses str dicts. Unicode dicts, and any other dicts, are left vulnerable. Unicode dicts are quite likely in Web frameworks/applications and other places which have well-thought text semantics.
That said, here's a suggestion to squeeze those bits:
1. add an additional field to all string objects, to cache the second hash value. a) variant: in 3.3, drop the extra field, and declare that hashes may change across runs
In 2.7, a string object has the following fields:
long ob_shash;
int ob_sstate;
Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits could cache a "hash perturbation" computed from the string and the random bits:
- hash() would use ob_shash
- dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3))
This way, you cache almost all computations, adding only a computation and a couple logical ops when looking up a string in a dict.
Regards
Antoine.
- Previous message: [Python-Dev] Hashing proposal: change only string-only dicts
- Next message: [Python-Dev] Hashing proposal: change only string-only dicts
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]