[Python-Dev] Hash randomization for which types? (original) (raw)
Maciej Fijalkowski fijall at gmail.com
Wed Feb 17 02:34:29 EST 2016
- Previous message (by thread): [Python-Dev] Hash randomization for which types?
- Next message (by thread): [Python-Dev] Hash randomization for which types?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Note that hashing in python 2.7 and prior to 3.4 is simply broken and the randomization does not do nearly enough, see https://bugs.python.org/issue14621
On Wed, Feb 17, 2016 at 4:45 AM, Shell Xu <shell909090 at gmail.com> wrote:
I thought you are right. Here is the source code in python 2.7.11:
long PyObjectHash(PyObject *v) { PyTypeObject *tp = v->obtype; if (tp->tphash != NULL) return (*tp->tphash)(v); /* To keep to the general practice that inheriting * solely from object in C code should work without * an explicit call to PyTypeReady, we implicitly call * PyTypeReady here and then check the tphash slot again */ if (tp->tpdict == NULL) { if (PyTypeReady(tp) < 0)_ _return -1;_ _if (tp->tphash != NULL) return (*tp->tphash)(v); } if (tp->tpcompare == NULL && RICHCOMPARE(tp) == NULL) { return PyHashPointer(v); /* Use address as hash value */ } /* If there's a cmp but no hash defined, the object can't be hashed */ return PyObjectHashNotImplemented(v); } If object has hash function, it will be used. If not, PyHashPointer will be used. Which PyHashSecret are not used. And I checked reference of PyHashSecret. Only bufferobject, unicodeobject and stringobject use PyHashSecret. On Wed, Feb 17, 2016 at 9:54 AM, Steven D'Aprano <steve at pearwood.info> wrote:
On Tue, Feb 16, 2016 at 11:56:55AM -0800, Glenn Linderman wrote: > On 2/16/2016 1:48 AM, Christoph Groth wrote: > >Hello, > > > >Recent Python versions randomize the hashes of str, bytes and datetime > >objects. I suppose that the choice of these three types is the result > >of a compromise. Has this been discussed somewhere publicly? > > Search archives of this list... it was discussed at length. There's a lot of discussion on the mailing list. I think that this is the very start of it, in Dec 2011: https://mail.python.org/pipermail/python-dev/2011-December/115116.html and continuing into 2012, for example: https://mail.python.org/pipermail/python-dev/2012-January/115577.html https://mail.python.org/pipermail/python-dev/2012-January/115690.html and a LOT more, spread over many different threads and subject lines. You should also read the issue on the bug tracker: http://bugs.python.org/issue13703
My recollection is that it was decided that only strings and bytes need to have their hashes randomized, because only strings and bytes can be used directly from user-input without first having a conversion step with likely input range validation. In addition, changing the hash for ints would break too much code for too little benefit: unlike strings, where hash collision attacks on web apps are proven and easy, hash collision attacks based on ints are more difficult and rare. See also the comment here: http://bugs.python.org/issue13703#msg151847
> >I'm not a web programmer, but don't web applications also use > >dictionaries that are indexed by, say, tuples of integers? > > Sure, and that is the biggest part of the reason they were randomized. But they aren't, as far as I can see: [steve at ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))" 1071302475 [steve at ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))" 1071302475 Web apps can use dicts indexed by anything that they like, but unless there is an actual attack, what does it matter? Guido makes a good point about security here: https://mail.python.org/pipermail/python-dev/2013-October/129181.html > I think hashes of all types have been randomized, not just the list > you mentioned. I'm pretty sure that's not actually the case. Using 3.6 from the repo (admittedly not fully up to date though), I can see hash randomization working for strings: [steve at ando 3.6]$ ./python -c "print(hash('abc'))" 11601873 [steve at ando 3.6]$ ./python -c "print(hash('abc'))" -2009889747 but not for ints: [steve at ando 3.6]$ ./python -c "print(hash(42))" 42 [steve at ando 3.6]$ ./python -c "print(hash(42))" 42 which agrees with my recollection that only strings and bytes would be randomized. -- Steve
Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/shell909090%40gmail.com
-- 彼節者有間,而刀刃者無厚;以無厚入有間,恢恢乎其於游刃必有餘地矣。 blog: http://shell909090.org/blog/ twitter: @shell909090 about.me: http://about.me/shell909090
Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
- Previous message (by thread): [Python-Dev] Hash randomization for which types?
- Next message (by thread): [Python-Dev] Hash randomization for which types?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]