> On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote:
>> Terry Reedy, 14.01.2012 06:43:
>>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
>>>
>>>> It is perfectly okay to break existing users who had anything depending
>>>> on ordering of internal hash tables. Their code was already broken.
>>>
>>> Given that the doc says "Return the hash value of the object", I do not
>>> think we should be so hard-nosed. The above clearly implies that there is
>>> such a thing as *the* Python hash value for an object. And indeed, that
>> has
>>> been true across many versions. If we had written "Return a hash value
>> for
>>> the object, which can vary from run to run", the case would be different.
>>
>> Just a side note, but I don't think hash() is the right place to document
>> this.
>
> You mean we shouldn't document that the hash() of a string will vary per
> run?

No, I mean that the hash() builtin function is not the right place to
document the behaviour of a string hash. That should go into the string
object documentation.

Although, arguably, it may be worth mentioning in the docs of hash() that,
in general, hash values of builtin types are bound to the lifetime of the
interpreter instance (or entire runtime?) and may change after restarts. I
think that's a reasonable restriction to document that prominently, even if
it will only apply to str for the time being.

Actually it will apply to a lot more than str, because the hash of (immutable) compound objects is often derived from the hash of the constituents, e.g. hash of a tuple.
">

(original) (raw)

On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel <stefan\_ml@behnel.de> wrote:

Guido van Rossum, 15.01.2012 17:10:

> On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote:
>> Terry Reedy, 14.01.2012 06:43:
>>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
>>>
>>>> It is perfectly okay to break existing users who had anything depending
>>>> on ordering of internal hash tables. Their code was already broken.
>>>
>>> Given that the doc says "Return the hash value of the object", I do not
>>> think we should be so hard-nosed. The above clearly implies that there is
>>> such a thing as \*the\* Python hash value for an object. And indeed, that
>> has
>>> been true across many versions. If we had written "Return a hash value
>> for
>>> the object, which can vary from run to run", the case would be different.
>>
>> Just a side note, but I don't think hash() is the right place to document
>> this.
>
> You mean we shouldn't document that the hash() of a string will vary per
> run?

No, I mean that the hash() builtin function is not the right place to
document the behaviour of a string hash. That should go into the string
object documentation.

Although, arguably, it may be worth mentioning in the docs of hash() that,
in general, hash values of builtin types are bound to the lifetime of the
interpreter instance (or entire runtime?) and may change after restarts. I
think that's a reasonable restriction to document that prominently, even if
it will only apply to str for the time being.

Actually it will apply to a lot more than str, because the hash of (immutable) compound objects is often derived from the hash of the constituents, e.g. hash of a tuple.


>> Hashing is a protocol in Python, just like indexing or iteration.

>> Nothing keeps an object from changing its hash value due to modification,

>

> Eh? There's a huge body of cultural awareness that only immutable objects

> should define a hash, implying that the hash remains constant during the

> object's lifetime.

>

>> and that would even be valid in the face of the usual dict lookup

>> invariants if changes are only applied while the object is not referenced

>> by any dict.

>

> And how would you know it isn't?


Well, if it's an object with a mutable hash then it's up to the application
defining that object to make sure it's used in a sensible way. Immutability
just makes your life easier. I can imagine that an object gets removed from
a dict (say, a cache), modified and then reinserted, and I think it's valid
to allow the modification to have an impact on the hash in this case, in
order to accommodate for any changes to equality comparisons due to the
modification.

That could be considered valid only in a very abstract, theoretical, non-constructive way, since there is no protocol to detect removal from a dict (and you cannot assume an object is used in only one dict at a time).


That being said, it seems that the Python docs actually consider constant

hashes a requirement rather than a virtue.



http://docs.python.org/glossary.html#term-hashable



"""

An object is hashable if it has a hash value which never changes during its

lifetime (it needs a __hash__() method), and can be compared to other

objects (it needs an __eq__() or __cmp__() method). Hashable objects which

compare equal must have the same hash value.

"""



It also seems to me that the wording "has a hash value which never changes

during its lifetime" makes it pretty clear that the lifetime of the hash

value is not guaranteed to supersede the lifetime of the object (although

that's a rather muddy definition - memory lifetime? or pickle-unpickle as

well?).

Across pickle-unpickle it's not considered the same object. Pickling at best preserves values.

However, this entry in the glossary only seems to have appeared with Py2.6,

likely as a result of the abc changes. So it won't help in defending a

change to the hash function.



>> So the guarantees do not depend on the function hash() and may
>> be even weaker than your above statement.
>
> There are no actual guarantees for hash(), but lots of rules for
> well-behaved hashes.

Absolutely.

--
--Guido van Rossum (python.org/\~guido)