[Python-Dev] Status of the fix for the hash collision vulnerability (original) (raw)
Guido van Rossum guido at python.org
Sun Jan 15 18:44:08 CET 2012
- Previous message: [Python-Dev] Status of the fix for the hash collision vulnerability
- Next message: [Python-Dev] Status of the fix for the hash collision vulnerability
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
Guido van Rossum, 15.01.2012 17:10: > On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote: >> Terry Reedy, 14.01.2012 06:43: >>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote: >>> >>>> It is perfectly okay to break existing users who had anything depending >>>> on ordering of internal hash tables. Their code was already broken. >>> >>> Given that the doc says "Return the hash value of the object", I do not >>> think we should be so hard-nosed. The above clearly implies that there is >>> such a thing as the Python hash value for an object. And indeed, that >> has >>> been true across many versions. If we had written "Return a hash value >> for >>> the object, which can vary from run to run", the case would be different. >> >> Just a side note, but I don't think hash() is the right place to document >> this. > > You mean we shouldn't document that the hash() of a string will vary per > run?
No, I mean that the hash() builtin function is not the right place to document the behaviour of a string hash. That should go into the string object documentation. Although, arguably, it may be worth mentioning in the docs of hash() that, in general, hash values of builtin types are bound to the lifetime of the interpreter instance (or entire runtime?) and may change after restarts. I think that's a reasonable restriction to document that prominently, even if it will only apply to str for the time being.
Actually it will apply to a lot more than str, because the hash of (immutable) compound objects is often derived from the hash of the constituents, e.g. hash of a tuple.
>> Hashing is a protocol in Python, just like indexing or iteration. >> Nothing keeps an object from changing its hash value due to modification, > > Eh? There's a huge body of cultural awareness that only immutable objects > should define a hash, implying that the hash remains constant during the > object's lifetime. > >> and that would even be valid in the face of the usual dict lookup >> invariants if changes are only applied while the object is not referenced >> by any dict. > > And how would you know it isn't?
Well, if it's an object with a mutable hash then it's up to the application defining that object to make sure it's used in a sensible way. Immutability just makes your life easier. I can imagine that an object gets removed from a dict (say, a cache), modified and then reinserted, and I think it's valid to allow the modification to have an impact on the hash in this case, in order to accommodate for any changes to equality comparisons due to the modification.
That could be considered valid only in a very abstract, theoretical, non-constructive way, since there is no protocol to detect removal from a dict (and you cannot assume an object is used in only one dict at a time).
That being said, it seems that the Python docs actually consider constant hashes a requirement rather than a virtue.
http://docs.python.org/glossary.html#term-hashable """ An object is hashable if it has a hash value which never changes during its lifetime (it needs a hash() method), and can be compared to other objects (it needs an eq() or cmp() method). Hashable objects which compare equal must have the same hash value. """ It also seems to me that the wording "has a hash value which never changes during its lifetime" makes it pretty clear that the lifetime of the hash value is not guaranteed to supersede the lifetime of the object (although that's a rather muddy definition - memory lifetime? or pickle-unpickle as well?).
Across pickle-unpickle it's not considered the same object. Pickling at best preserves values.
However, this entry in the glossary only seems to have appeared with Py2.6, > likely as a result of the abc changes. So it won't help in defending a > change to the hash function. >>> >> So the guarantees do not depend on the function hash() and may > >> be even weaker than your above statement. > > > > There are no actual guarantees for hash(), but lots of rules for > > well-behaved hashes. >> Absolutely. >
--Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20120115/398f4ec3/attachment-0001.html>
- Previous message: [Python-Dev] Status of the fix for the hash collision vulnerability
- Next message: [Python-Dev] Status of the fix for the hash collision vulnerability
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]