Issue 18558: Iterable glossary entry needs clarification (original) (raw)
Created on 2013-07-25 23:57 by Zero, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Messages (19)
Author: Stephen Paul Chappell (Zero)
Date: 2013-07-25 23:57
The following interactive session shows that iterables are not detected properly by the collections.abc.Iterable
class.
>>> class IsIterable:
def __init__(self, data):
self.data = data
def __getitem__(self, key):
return self.data[key]
>>> is_iterable = IsIterable(range(5))
>>> for value in is_iterable:
value
0
1
2
3
4
>>> from collections.abc import Iterable
>>> isinstance(is_iterable, Iterable)
False
Author: R. David Murray (r.david.murray) *
Date: 2013-07-26 01:25
The definition of an Iterable is a class that defines an iter method. Your class does not, so the behavior you site is correct.
The glossary entry for 'iterable' could use a little clarification. A class that defines getitem is an iterable if and only if it returns results when passed integers. Since the documentation for Iterable references that glossary entry, it should probably also be explicit that defining getitem does not (because of the forgoing limitation) cause isinstance(x, Iterable) to be True. For a class that does not define iter, you must explicitly register it with Iterable.
To see why this must be so, consider this:
y = IsIterable({'a': 'b', 'c': 'd'}) [x for x in y] Traceback (most recent call last): File "", line 1, in File "", line 1, in File "", line 5, in getitem KeyError: 0
Author: Terry J. Reedy (terry.reedy) *
Date: 2013-07-26 23:28
Stephen, your class, or rather instances thereof when initialized with a sequence, follow the old iteration protocol. You might call them iterators in the generic sense, though I cannot remember whether we used 'iterator' much before the introduction of the new and now dominant iteration protocol. I am sure 'iterable' was introduced with the new protocol for objects with .iter methods that return iterators, which in this context means an object with a .next method and excludes .getitem objects.
It would have been less confusing is we had disabled the old protocol in 3.0, but aside from the predictable confusion, it seemed better to keep it.
Author: Stephen Paul Chappell (Zero)
Date: 2013-08-01 19:05
If my program needed to know if an object is iterable, it would be tempting to define and call the following function instead of using collections.abc.Iterable:
def iterable(obj):
try:
iter(obj)
except TypeError:
return False
return True
Something tells me that is not what the author of collections.abc.Iterable intended.
Author: R. David Murray (r.david.murray) *
Date: 2013-08-01 20:26
That would give you a false positive, though. It would return True for the 'y' in my example, which is not iterable. So Iterable's behavior here is an example of the Python design rule "resist the temptation to guess".
As Terry said, new classes should implement an iter method. The getitem iteration support is for backward compatibility.
Author: Stephen Paul Chappell (Zero)
Date: 2013-08-01 21:46
Maybe this would have been more appropriate as a question on StackOverflow:
What is the proper way of asking if an object is iterable if it does not support the iterator protocol but does support the old getitem protocol? One might argue that it is better to ask for forgiveness rather than permission, but that does not really answer the question.
My impression of collections.abc.Iterable is that programmers can use it to ask if an object is iterable. Some argue that it is better to ask for forgiveness rather that permission and would suggest pretending that an object is iterable until it is proven otherwise. However, what do we use collections.abc.Iterable’s for then?
The true question is really, “What is the proper way of asking if an object is iterable if it does not support the iterator protocol but does support the old getitem protocol?” More generically, how can you ask an object if it supports ANY iteration protocol? The query probably should have been posted on StackOverflow and not here.
This may not be a problem with collections.abc.Iterable, and thus the issue should be closed. However, the previous question remains, and it is apparent that it cannot be answered with the abstract class as it currently is. Maybe the solution is to just ask for forgiveness where appropriate.
Author: R. David Murray (r.david.murray) *
Date: 2013-08-01 23:29
“What is the proper way of asking if an object is iterable if it does not support the iterator protocol but does support the old getitem protocol?”
The only answer to that question is to try to iterate it, and see if you get a KeyError on "0". Since this results in obtaining the first element if it is iterable, and in the general case you cannot "reset" an iterable, there is no way to look before you leap. You have to catch the error after it occurs.
This question and answer probably do belong on Stack Overflow or python-list, but the glossary entry still needs improvement, since the Iterable docs reference it :)
Author: Vedran Čačić (veky) *
Date: 2017-07-17 08:19
I think this is backwards. "Refusing the temptation to guess" in this case can mean returning True for is_iterable. After all, we can always have something like
class Deceptive:
def __iter__(self):
raise TypeError("I'm not really iterable")
and it's not the business of instancecheck to actually iterate (either via iter, or getitem). Its task is to check whether it has a corresponding attribute (not set to None, per the new convention of explicitly disabling protocols).
It could be different if the "old getitem iteration" was deprecated, or at least scheduled to be deprecated, but as far as I can tell, it isn't. (It really should be documented if it were so.)
At least, the documentation of https://docs.python.org/3/library/collections.abc.html#collections.abc.Iterable should be more precise in saying (instead of just "See also the definition of iterable.") something like "Note that the definition of iterable in the glossary is more general than what this method checks, by design / omission / backward compatibility / apathy / whatever."
(Ok, the last part might be too much. But it's essential to point out the things are different, and whether it's meant to stay that way.)
Author: R. David Murray (r.david.murray) *
Date: 2017-07-17 12:51
No, refusing to guess in this case is to believe the class's declaration that it is an iterable if (and only if) it defines iter, which is the modern definition of iterable. If that doesn't work when the object is iterated, that's a bug in the class claiming to be an iterable when it isn't.
The confusion here is the existence of the older iteration protocol. As you say, the documentation can use some improvement. Eventually someone will submit a proposal in the form of a PR and we can hammer out the exact wording.
Author: Vedran Čačić (veky) *
Date: 2017-07-17 13:05
Of course. The Deceptive class was just reductio ad absurdum. I'm all for believing the class through what attributes does it expose. We agree there.
Where we don't agree, is what attributes constitute the iteration protocol. You, the source code and the documentation of the collections.abc.Iterable say one thing (iter), while I, the current version of Python (at least CPython, but I think other implementations do the same) and the glossary say another thing (iter or getitem).
[It's not the only protocol consisting of two attributes... e.g. bool protocol also consists of two attributes, bool and len (though it is not optional, so we don't have collections.abc.Boolable).]
You seem to say that only the glossary needs fixing. But then we'll be in an even more weird position, where we must say some objects can be iterated, but are not iterables. I'm pretty sure you don't want that. The whole point of "Xable" words (e.g. "callable", as opposed to "function") is that it encompasses everything that can be Xed, not only the first thing that comes to mind (e.g. classes can also be called).
Or are you saying that after the glossary is fixed, then we should fix Python by (at least deprecating, if not) forbidding getitem iteration? I'm not sure that this is the consensus. Are you?
Author: Raymond Hettinger (rhettinger) *
Date: 2017-07-17 14:30
The wold "iterable" just means "can be looped over". There are many ways to implement this capability (two-arg form of iter(), the iter method, generators, getitem with integer indexing, etc).
collections.abc.Iterable is more limited and that is okay. There is nothing that compels us to break an API has been around and successful for 26+ years. That clearly wasn't Guido's intention when he added collections.abc.Iterable which is just a building block for more complex ABCs.
I recommend closing this. We're not going to kill a useful API and break tons of code because of an overly pedantic reading of what is allowed to be iterable.
However we can make a minor amendment to the glossary entry to mention that there are multiple ways of becoming iterable.
Stephen, the try/except is a reasonable way to recognize an iterable. The ABCs are intended to recognize only things that implement a particular implementation or that are registered. It is not more encompassing or normative than that.
Author: Vedran Čačić (veky) *
Date: 2017-07-17 14:40
Raymond, I think you didn't understand the issue. Glossary already has the ammendment you mention (at least for the getitem - I'm not sure any of other examples you mention are counterexamples to that interpretation: callable_iterators and generators do have an iter attribute, and they are correctly detected as instances of collections.abc.Iterable).
I wanted to push in the opposite direction, to fully bless getitem as a way to declare iterability, so it could be recognized by Iterable's instancecheck. Because it seems to me that whoever wrote that instancecheck, didn't have the intention to exclude getitem iteration.
Or at least, if we cannot do that because of backward compatibility:-(, to explicitly document that Iterable ABC does not fully encompass what we mean by "being iterable".
Author: Raymond Hettinger (rhettinger) *
Date: 2017-07-17 15:05
Or at least, if we cannot do that because of backward compatibility:-(, to explicitly document that Iterable ABC does not fully encompass what we mean by "being iterable".
That would be a reasonable amendment to collections.abc.Iterable docs.
I don't think it is either desirable or possible for collections.abc.Iterable to recognize iterables with getitem. We cannot know it advance whether getitem is a mapping or a sequence. IIRC, that particular problem was the motivation for creating the ABCs. Without a user registering a class as Iterable or without inheriting from Iterable, there is really no way to know.
Author: Vedran Čačić (veky) *
Date: 2017-07-17 15:47
Yes, the mapping/sequence distinction was (at least declaratively) the reason the ABCs were introduced, but that isn't an obstacle here: whether a mapping or a sequence, it is iterable, right?
In case anybody is interested, here's how I came to this problem: at a programming competition, I set a problem where contestants had to write some function, and I declared that "the function must work for arbitrary iterable (with some properties that currently don't matter)".
Then a big discussion ensued, with a big group of people thinking that classes with getitem but no iter don't quality (giving collections.abc.Iterable as an argument), and another big group of people thinking they do (giving EAFP as an argument: "look, I tried iterating, and succeeded").
Of course, it's an incredibly technical detail, but I don't like such gray areas. To me, things with getitem are clearly iterable - the glossary says so:-). Iterable's instancecheck is simply buggy ("incomplete", if you want). There might be valid reasons for keeping it buggy, but they should be documented.
Author: R. David Murray (r.david.murray) *
Date: 2017-07-17 17:26
"things with getitem are clearly iterable"
This is false. IMO it should be fixed in the glossary. It should say "or getitem method implementing sequence semantics". That plus the addition to the Iterable docs will close this issue.
Author: Terry J. Reedy (terry.reedy) *
Date: 2017-07-17 19:43
The problem with the Iterable ABC is that 'iterable' and 'iterator' are dynamically defined, with a possibly infinite time required to possibly destructively check either definition. In general, an algorithmic static check can only guess whether an object is iterable, though humans analyzing enough code can potentially get it right. Therefore, using isinstance(ob, Iterable) is not 100% reliable, and in my opinion should not be used as the definition of lower-case 'iterable'.
Definition: Object ob is iterable if 'iter(ob)' returns an iterator. For the reasons given above, iter may return a non-iterator, but it will if ob implements either the old or new iterator protocol. If ob has .iter, iter returns ob.iter(). If ob has .getitem, iter returns iterator(ob), where iterator is a hidden internal class that embodies the old iterator protocol by defining a .next method that calls .getitem. In both cases, iter does the best it can by assuming that the methods are correctly written as per one of the two protocols.
Loose definition: Object 'it' is iterable if it can be looped over. Python definition: Object 'it' is iterable if repeated 'next(it)' calls either return an object or raise StopIteration. This means that
try: while True: next(it) except StopIteration: pass
runs, possibly forever, without raising.
As Raymond noted, an iterator can be created multiple ways: IteratorClass(), iter(ob), iter(func, sentinal), generator_func().
Iterable versus iter with respect to classes with getitem:
Iter was added in 2.2. Built-in iterables were only gradually converted from old to new protocol, by adding a new .iter. So even ignoring user classes, iter had to respect .getitem. Even today, though only a small fraction of classes with .getitem are iterable, people do not generally call iter() on random objects.
Iterable (added 2.6) is documented as the "ABC for classes that provide the iter() method." In other words, isinstance(ob, Iterable) replaces hasattr(ob, 'iter'). Except that the former is more than that. The magic word 'register' does not appear in the collections.ABC doc, and I think that this is the omission to be remedied.
"ABC for classes that provide the iter() method, or that provide a getitem method that implements the old iterator protocol and register themselves as Iterable."
An example could be given using a patched version of IsIterable.
If one adds two lines of code
from collections.abc import Iterable ... Iterable.register(IsIterable)
then isinstance(IsIterable(3), Iterable) is True, except that this is a lie in the other direction.
Traceback (most recent call last): File "F:\Python\mypy\tem.py", line 17, in for i in it2: File "F:\Python\mypy\tem.py", line 7, in getitem return self.data[key] TypeError: 'int' object is not subscriptable
Either IsIterable.init must check that data itself has .getitem or IsIterable.next must capture exceptions and raise IndexError instead.
def __getitem__(self, key):
try:
return self.data[key]
except Exception:
raise IndexError
Author: Raymond Hettinger (rhettinger) *
Date: 2017-09-24 19:09
I'll follow David Murray's suggestion here. The glossary definition of iterable is already very good, it just needs to clarify that the getitem() method needs to implement sequence semantics. Anything further is beyond the scope of a glossary entry.
Also, I'll amend the docs on collections.abc.Iterable() to be more specific about what it is does and doesn't recognize.
FWIW, the topic is also discussed in other places:
- https://docs.python.org/3/library/functions.html#iter
- https://docs.python.org/3/reference/datamodel.html#object.getitem
- https://docs.python.org/3/reference/datamodel.html#object.iter
Author: Raymond Hettinger (rhettinger) *
Date: 2017-09-25 07:52
New changeset 0bf287b6e0a42877b06cbea5d0fe6474d8061caa by Raymond Hettinger in branch 'master': bpo-18558: Clarify glossary entry for "Iterable" (#3732) https://github.com/python/cpython/commit/0bf287b6e0a42877b06cbea5d0fe6474d8061caa
Author: Raymond Hettinger (rhettinger) *
Date: 2017-09-25 07:57
New changeset 01438ed4c22ca150da1cc5c38d83a59b0b6a62a7 by Raymond Hettinger (Miss Islington (bot)) in branch '3.6': [3.6] bpo-18558: Clarify glossary entry for "Iterable" (GH-3732) (#3741) https://github.com/python/cpython/commit/01438ed4c22ca150da1cc5c38d83a59b0b6a62a7
History
Date
User
Action
Args
2022-04-11 14:57:48
admin
set
github: 62758
2019-10-31 14:22:25
Zero
set
nosy: - Zero
2017-09-25 08:00:37
rhettinger
set
status: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-09-25 07:57:26
rhettinger
set
messages: +
2017-09-25 07:52:22
python-dev
set
pull_requests: + <pull%5Frequest3728>
2017-09-25 07:52:12
rhettinger
set
messages: +
2017-09-24 19:15:40
rhettinger
set
keywords: + patch
stage: needs patch -> patch review
pull_requests: + <pull%5Frequest3718>
2017-09-24 19:09:49
rhettinger
set
messages: +
2017-07-17 19:43:02
terry.reedy
set
messages: +
2017-07-17 17:26:34
r.david.murray
set
messages: +
versions: + Python 3.6
2017-07-17 15:47:49
veky
set
messages: +
2017-07-17 15:05:53
rhettinger
set
messages: +
2017-07-17 14:40:33
veky
set
messages: +
2017-07-17 14:30:24
rhettinger
set
versions: + Python 3.7, - Python 3.3, Python 3.4
nosy: + rhettinger
messages: +
assignee: docs@python -> rhettinger
2017-07-17 13:05:27
veky
set
messages: +
2017-07-17 12:51:48
r.david.murray
set
messages: +
2017-07-17 08:19:55
veky
set
nosy: + veky
messages: +
2013-08-01 23:29:18
r.david.murray
set
messages: +
2013-08-01 21:46:11
Zero
set
messages: +
2013-08-01 20:26:32
r.david.murray
set
messages: +
2013-08-01 19:05:23
Zero
set
messages: +
2013-07-26 23:28:20
terry.reedy
set
nosy: + terry.reedy
messages: +
2013-07-26 01:25:40
r.david.murray
set
assignee: docs@python
components: + Documentation, - Library (Lib)
title: Iterables not detected correctly -> Iterable glossary entry needs clarification
nosy: + docs@python, r.david.murray
versions: + Python 3.4
messages: +
stage: needs patch
2013-07-25 23:57:31
Zero
create