[Python-Dev] Hash randomization for which types? (original) (raw)

Christoph Groth christoph at grothesque.org
Wed Feb 17 09:51:50 EST 2016


Steven D'Aprano wrote:

On Tue, Feb 16, 2016 at 11:56:55AM -0800, Glenn Linderman wrote:

On 2/16/2016 1:48 AM, Christoph Groth wrote: >Recent Python versions randomize the hashes of str, bytes and datetime >objects. I suppose that the choice of these three types is the result >of a compromise. Has this been discussed somewhere publicly?

Search archives of this list... it was discussed at length. There's a lot of discussion on the mailing list. I think that this is the very start of it, in Dec 2011: (...)

I tried searching myself for an hour or so, but though I found many discussions, I didn't see any discussion about whether hashes of other types should be randomized as well. The relevant PEP also doesn't touch this issue.

My recollection is that it was decided that only strings and bytes need to have their hashes randomized, because only strings and bytes can be used directly from user-input without first having a conversion step with likely input range validation. In addition, changing the hash for ints would break too much code for too little benefit: unlike strings, where hash collision attacks on web apps are proven and easy, hash collision attacks based on ints are more difficult and rare.

See also the comment here: http://bugs.python.org/issue13703#msg151847

Perfect, that's exactly what I was looking for. I am reassured that this has been thought through. Thanks a lot!

Christoph



More information about the Python-Dev mailing list