Tips on Python collections - Alex Marandon (original) (raw)

Published on 11 June 2011, updated on 11 June 2011, Comments

Here is a short series of simple tips for working with Python collections. It’s nothing fancy but I found by reading other people’s code that not everybody is aware of these techniques.

This is an adaptation of a post I’ve originally written in French. There is also a Chinese version.

Checking if a list is empty

It’s not necessary to call the function len to check if a list is empty because an empty list evaluates to False. So instead of doing:

if len(mylist):
    # Do something with my list
else:
    # The list is empty

You can simply do:

if mylist:
    # Do something with my list
else:
    # The list is empty

Getting indexes of elements while iterating over a list

Sometimes you need to iterate over a list and at the same time get the index of each element. Instead of doing:

i = 0
for element in mylist:
    # Do something with i and element
    i += 1

You can simply do:

for i, element in enumerate(mylist):
    # Do something with i and element
    pass

Sorting a list

It’s quite common to sort a list based on one of the characteristics of its elements. Here for example, we create a list of persons:

class Person(object):
    def __init__(self, age):
        self.age = age

persons = [Person(age) for age in (14, 78, 42)]

We then want to sort the list based on the age. Here is how we could do it:

def get_sort_key(element):
    return element.age

for element in sorted(persons, key=get_sort_key):
    print "Age:", element.age

We define a function that returns the attribute we want to use as a sorting criteria and we pass this function as the key argument to the function sorted. This kind of sorting is so common that Python standard library includes ready-made functions to do that.

from operator import attrgetter

for element in sorted(persons, key=attrgetter('age')):
    print "Age:", element.age

attrgetter is a higher-order function that returns a function similar to ourget_sort_key function. We saved a few keystrokes (in that respect a lambdawould have been good too) but more importantly I feel it makes the code easier to read. When you see attrgetter you know immediately what it will get an_attribute_. The operator module also provides itemgetter andmethodcaller and I’m sure you can guess what they do.

Grouping elements in a dictionary

The last tip I’ll give you today is about dictionaries. It’s a quite common task to group elements of a list based on a criteria. In order to do that we build a dictionary of lists indexed by that criteria. Let’s say we have list of persons.

class Person(object):
    def __init__(self, age):
        self.age = age

persons = [Person(age) for age in (78, 14, 78, 42, 14)]

Now we want to group these persons by age. One approach could be:

persons_by_age = {}

for person in persons:
    age = person.age
    if age in persons_by_age:
        persons_by_age[age].append(person)
    else:
        persons_by_age[age] = [person]

assert len(persons_by_age[78]) == 2

For each iteration we test if the key exists using the in operator. It it’s not present, we need to create a list for this key using the current element.

The collections module offers a defaultdict that sightly simplify this process.

from collections import defaultdict

persons_by_age = defaultdict(list)

for person in persons:
    persons_by_age[person.age].append(person)

When you create a defaultdict, you pass it a callable that it will use to create values for the dict when a key is missing. Here we pass it list so it’s going to create a list for each new key.

That’s it for now. I hope some of these tips will be useful to you. If you have more tips on collections, please feel free to share in the comments. Thanks!