How Iterables actually work in Python (original) (raw)
One of the more impressive features of Python language is the use of “for-each” looping construct that has awed me ever since I started out with Python. For the uninitiated, here is a simple for loop which prints the first 10 natural numbers:
for num in range(1, 11):print(num)
We can also loop over the primitive types such as list, tuples, dictionaries and strings in similar ways:
numbers = [1, 2, 3, 4, 5]record = ('Kshitij', 21, 'Loves Python')details = {'name': 'Kshitij','age': 21}
for num in numbers:print(num) # 1 2 3 4 5
for data in record:print(data) # Kshitij 21 Loves Python
for key, value in details.items():print(key, value) # age 21 name Kshitij
As one implements few data structures in Python using class
, he feels the desire to loop over the data stored in it’s instances. This is where the Iterator Protocol comes into picture.
Sample Implementation
Let us suppose we are tasked with implementing a standard 52-card deck. A sample implementation might look something like this:
This works fine with regards to creating new instances of Deck
and representing it. However, a major pain point in this implementation is the lack of ability to iterate over the Deck
object.
>>> from cards import Deck>>> new_deck = Deck() # New deck instantiated>>> print(new_deck)... # Works great>>> for card in new_deck:... print(card)
TypeError: 'Deck' object is not iterable
One can be smart enough to explore the instance new_deck
and conclude that the cards
attributes holds the data required for iterations and it, in fact is a list
. With this knowledge, he can hack the above loop as follows:
>>> for card in new_deck.cards:... print(card)Card(...)....
This code works great. However, the end user must attain the internal information about the implementation to perform the iteration. This makes our code lose the advantages of data abstraction and leads much to be desired of the implementation.
There must be a better way!
Urged by the enthusiasm from Raymond Hettinger, I searched for ways to improve my implementation to couple with the Python’s for
loop.
And soon I found the answer — The Iterator Protocol.
The Iterator Protocol
In order to learn what the Protocol is and how to implement it in Python, we need to understand some basic terms.
Iterable
- It is any object that you can loop over with a for loop.
- Iterables are not always indexable, they don’t always have lengths, and they’re not always finite.
- An iterable can be passed to
iter()
built-in function to get an iterator for them.
Iterator
- Iterators have exactly one job: return the “next” item in our iterable.
- Iterators can be passed to the built-in
next
function to get the next item from them and if there is no next item (because we reached the end), aStopIteration
exception will be raised. - Iterators return themselves when passed to the
iter()
built-in.
The Protocol
Step 01: How the iter()
built-in works?
Whenever the interpreter needs to iterate over an object x
, it automatically calls iter(x)
. The iter
built-in function:
- Checks whether the object implements,
__iter__
method and calls that to obtain an iterator. - If
__iter__
method is not implemented, but__getitem__
method is implemented, Python creates an iterator that attempts to fetch items in order, starting from index0
. - If that fails, Python raises
TypeError
exception saying<classname> object is not iterable
.
Step 02: How to implement the Protocol?
I will present two approaches to implementing the Iterator Protocol:
Approach 1: Traditional Way
- Create a new class representing the iterator(say DeckIterator).
- Implement the following two methods in DeckIterator:
__next__
: returns the next item in the iterable.
__iter__
: returns itself i.e self
.
3. Define an __iter__
method in the class over whose instances you want to iterate i.e. class Deck. The method should return an instance of DeckIterator.
Approach 2: The Pragmatic Way
- Implement the
__iter__
method in the Deck class as a generator function.
Features
This is the list of all the features that our object magically seem to support as soon as we implement the protocol.
- Iteration via for loop
- Unpacking similar to tuples
- Can be used in list comprehensions
- Can be used with built-in functions (such as
min
,max
) which consume an iterable.
>>> new_deck = Deck()
>>> # 1. Looping through a for loop>>> for card in new_deck:... print(card) # Works great!
>>> # 2. Unpacking similarly to tuples>>> first_card, *rest, last_card = new_deck
>>> # 3. List Comprehensions>>> spades = [card for card in new_deck if card.suit == 'Spades']
>>> # 4. Built-in functions>>> max_card, min_card = max(new_deck), min(new_deck)
Lessons Learnt:
- Iterators in Python aren’t a matter of type but of the protocol i.e. any class that implements this protocol can be iterated over.
- Python groks iteration.
I hope that the knowledge of the Iterator Protocol will help you out when writing Python. In order to raise awareness about this seemingly under appreciated feature of Python, I have proposed a talk at PyCon India 2017 on this topic.