Issue 23987: docs about containers membership testing wrong for broken objects (original) (raw)

Created on 2015-04-17 08:09 by ethan.furman, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (9)
msg241320 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-04-17 08:09
https://docs.python.org/3/reference/expressions.html#comparisons: ---------------------------------------------------------------- The operators 'in' and 'not in' test for membership. 'x in s' evaluates to true if x is a member of s, and false otherwise. 'x not in s' returns the negation of 'x in s'. All built-in sequences and set types support this as well as dictionary, for which 'in' tests whether the dictionary has a given key. For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression 'x in y' is equivalent to 'any(x is e or x == e for e in y)'. StackOverflow question for context: http://stackoverflow.com/q/29692140/208880 Summary: if a user creates a broken object such that __hash__ returns a random number with every invocation, then that object will get lost in a dict or set; but the above statement about 'equivalent to' claims that such an object will still be found. On the other hand, https://docs.python.org/3/glossary.html#term-hashable says that a constant return value is required for an object to be hashable (of course, Python can't tell if future calls to __hash__ will return the same value). Perhaps a link to the #term-hashable would be appropriate?
msg241326 - (view) Author: Jonathan Sharpe (jonrsharpe) * Date: 2015-04-17 10:08
I don't think it's as simple as linking to the hashable definition. The "equivalent expression" is simply wrong for dict/set/frozenset, as those types check hash equality, not identity.
msg241337 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-04-17 15:04
It's not quite that simple -- those containers use the hash to find the objects that will be first checked by identity, and then equality* -- otherwise they would have to do a complete scan from first key to last, and that would kill performance. ... Okay, I see your point -- even for sane objects, a systematic check of every key is not undertaken for the hash-type containers. Perhaps something like: For container types such as list, tuple, or collections.deque, the expression 'x in y' is equivalent to 'any(x is e or x == e for e in y)'. For container types such as set, frozenset, and dict, 'x in y' is equivalent to 'any(x is e or x == e for e in z)' where 'z' is a collection of objects in 'y' that have the same hash.
msg241339 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-04-17 15:37
Could we add "For dictionaries and sets this equivalence expression is modified by the addition of 'if hash(x) == hash(e)'?
msg241341 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-04-17 16:11
I think I like that better than my suggestion. :)
msg241661 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-04-20 15:29
So something like: For container types such as list, tuple, or collections.deque, the expression 'x in y' is equivalent to 'any(x is e or x == e for e in y)'. For container types such as set, frozenset, and dict, this equivalence expression is modified by the addition of 'if hash(x) == hash(e)'. ?
msg241667 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-04-20 16:09
Yes, that's what I had in mind.
msg241694 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-04-21 02:30
There is a separate report for taking care of the identity check for contains: https://bugs.python.org/issue23986 I think notes about crazy hashes shouldn't spill all over our docs. At best, it warrants a FAQ entry about how hash tables work. The risk here is that in an effort to be more precise, it is easy impair the usability of the docs. The wording in question has been around for a very long time and has overall done a good job of explaining the intent of the in-operator to all but the most pedantic reader, "The operators 'in' and 'not in' test for membership. 'x in s' evaluates to true if x is a member of s, and false otherwise." If you really want to be precise, the *only* thing that can be broadly stated about the in-operator is that it calls __contains__ on an object that that object can implement whatever logic it wants (hash table lookup, linear search, google search, random result, etc). But then, this is no different than most magic methods in that regard.
msg350211 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-08-22 16:35
It looks like the identity-implies-equality part of this has already been taken care of. The __hash__ invariant also appears to now have extensive coverage, some in the glossary and much more in the data model section of the reference. Cute puzzles like a __hash__ returning a random value can be left for stackoverflow.
History
Date User Action Args
2022-04-11 14:58:15 admin set github: 68175
2020-11-18 10:45:06 peternowee set nosy: + peternowee
2019-08-22 16:35:30 rhettinger set status: open -> closedresolution: out of datemessages: + stage: resolved
2015-07-21 07🔞17 ethan.furman set nosy: - ethan.furman
2015-04-21 02:30:09 rhettinger set assignee: docs@python -> rhettingermessages: + nosy: + rhettinger
2015-04-20 16:09:49 r.david.murray set messages: +
2015-04-20 15:29:23 ethan.furman set messages: +
2015-04-17 16:11:27 ethan.furman set messages: +
2015-04-17 15:37:58 r.david.murray set nosy: + r.david.murraymessages: +
2015-04-17 15:04:40 ethan.furman set messages: +
2015-04-17 10:08:50 jonrsharpe set messages: +
2015-04-17 10:04:17 jonrsharpe set nosy: + jonrsharpe, docs@pythoncomponents: + Documentationassignee: docs@python
2015-04-17 08:09:25 ethan.furman create