msg241320 - (view) |
Author: Ethan Furman (ethan.furman) *  |
Date: 2015-04-17 08:09 |
https://docs.python.org/3/reference/expressions.html#comparisons: ---------------------------------------------------------------- The operators 'in' and 'not in' test for membership. 'x in s' evaluates to true if x is a member of s, and false otherwise. 'x not in s' returns the negation of 'x in s'. All built-in sequences and set types support this as well as dictionary, for which 'in' tests whether the dictionary has a given key. For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression 'x in y' is equivalent to 'any(x is e or x == e for e in y)'. StackOverflow question for context: http://stackoverflow.com/q/29692140/208880 Summary: if a user creates a broken object such that __hash__ returns a random number with every invocation, then that object will get lost in a dict or set; but the above statement about 'equivalent to' claims that such an object will still be found. On the other hand, https://docs.python.org/3/glossary.html#term-hashable says that a constant return value is required for an object to be hashable (of course, Python can't tell if future calls to __hash__ will return the same value). Perhaps a link to the #term-hashable would be appropriate? |
|
|
msg241326 - (view) |
Author: Jonathan Sharpe (jonrsharpe) * |
Date: 2015-04-17 10:08 |
I don't think it's as simple as linking to the hashable definition. The "equivalent expression" is simply wrong for dict/set/frozenset, as those types check hash equality, not identity. |
|
|
msg241337 - (view) |
Author: Ethan Furman (ethan.furman) *  |
Date: 2015-04-17 15:04 |
It's not quite that simple -- those containers use the hash to find the objects that will be first checked by identity, and then equality* -- otherwise they would have to do a complete scan from first key to last, and that would kill performance. ... Okay, I see your point -- even for sane objects, a systematic check of every key is not undertaken for the hash-type containers. Perhaps something like: For container types such as list, tuple, or collections.deque, the expression 'x in y' is equivalent to 'any(x is e or x == e for e in y)'. For container types such as set, frozenset, and dict, 'x in y' is equivalent to 'any(x is e or x == e for e in z)' where 'z' is a collection of objects in 'y' that have the same hash. |
|
|
msg241339 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2015-04-17 15:37 |
Could we add "For dictionaries and sets this equivalence expression is modified by the addition of 'if hash(x) == hash(e)'? |
|
|
msg241341 - (view) |
Author: Ethan Furman (ethan.furman) *  |
Date: 2015-04-17 16:11 |
I think I like that better than my suggestion. :) |
|
|
msg241661 - (view) |
Author: Ethan Furman (ethan.furman) *  |
Date: 2015-04-20 15:29 |
So something like: For container types such as list, tuple, or collections.deque, the expression 'x in y' is equivalent to 'any(x is e or x == e for e in y)'. For container types such as set, frozenset, and dict, this equivalence expression is modified by the addition of 'if hash(x) == hash(e)'. ? |
|
|
msg241667 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2015-04-20 16:09 |
Yes, that's what I had in mind. |
|
|
msg241694 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2015-04-21 02:30 |
There is a separate report for taking care of the identity check for contains: https://bugs.python.org/issue23986 I think notes about crazy hashes shouldn't spill all over our docs. At best, it warrants a FAQ entry about how hash tables work. The risk here is that in an effort to be more precise, it is easy impair the usability of the docs. The wording in question has been around for a very long time and has overall done a good job of explaining the intent of the in-operator to all but the most pedantic reader, "The operators 'in' and 'not in' test for membership. 'x in s' evaluates to true if x is a member of s, and false otherwise." If you really want to be precise, the *only* thing that can be broadly stated about the in-operator is that it calls __contains__ on an object that that object can implement whatever logic it wants (hash table lookup, linear search, google search, random result, etc). But then, this is no different than most magic methods in that regard. |
|
|
msg350211 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2019-08-22 16:35 |
It looks like the identity-implies-equality part of this has already been taken care of. The __hash__ invariant also appears to now have extensive coverage, some in the glossary and much more in the data model section of the reference. Cute puzzles like a __hash__ returning a random value can be left for stackoverflow. |
|
|