[Python-Dev] OrderedDict.values() behavior for modified instance (original) (raw)

Nikolaus Rath Nikolaus at rath.org
Sat Oct 26 05:22:53 CEST 2013


Hello,

The documentation says the following about modifying a dict while iterating through its view:

| Iterating views while adding or deleting entries in the dictionary may | raise a RuntimeError or fail to iterate over all entries. (http://docs.python.org/3/library/stdtypes.html#dict-views)

The OrderedDict documentation doesn't have anything to say on the subject. In practice, its implementation actually mostly correponds to naive expectations:

from collections import OrderedDict d = OrderedDict() for i in range(5): ... d[i] = i ... i = iter(d.values()) next(i) 0 del d[0] next(i) 1 del d[2] next(i) 3 d.movetoend(1) next(i) 4 next(i) 1

etc.

However, there is one case that results in a rather confusing error:

a = OrderedDict() a[0] = 0 a[1] = 1 a[2] = 2 i = iter(a.values()) next(i) 0 del a[0] del a[1] next(i) Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.3/collections/abc.py", line 495, in iter yield self._mapping[key] KeyError: 1

What happens here is that a[0] is returned from the linked list, but it still contains links to a[1]. When a[1] is deleted, the links of its predecessor and successor are updated, but a[0] still points to a[1]. The OrderedList then hands a non-existing key to the values() implementation in collections.abc.

The result is not only mightily confusing, but it is also not covered by the documentation (KeyError isn't a RuntimeError).

I would very much like to see this fixed, but I'm not sure if there's a good way to do that.

I see the following options:

(a) When deleting an element from an OrderedList, update the pointers in the corresponding linked list element to the end of the list. Then removing the currently "active" element during the iteration would automatically end the iteration.

For that, we'd just have to add two lines to __delitem__:

def __delitem__(self, key, dict_delitem=dict.__delitem__):
    dict_delitem(self, key)
    link = self.__map.pop(key)
    link_prev = link.prev
    link_next = link.next
    link_prev.next = link_next
    link_next.prev = link_prev

    link.prev = root # new 
    link.next = root # new


The new behavior is explicitly covered by the documentation
(changing the dict may result in incomplete iteration), but it's a
change to what happened before.

(b) When iterating through an OrderedDict, check that the next element is actually still in the hash, i.e. change

def __iter__(self):
    root = self.__root
    curr = root.next
    while curr is not root:
        yield curr.key
        curr = curr.next
to

def __iter__(self):
    root = self.__root
    curr = root.next
    while curr is not root and curr.key in self:
        yield curr.key
        curr = curr.next

that would terminate the iteration only in the special case, but
incur an extra dict lookup at every iteration.

Alternatively, one could try very hard to not stop the iteration:

    while curr is not root:
        yield curr.key
        while curr is not root:
            curr = curr.next
            if curr.key in self:
                break

(c) Catch the KeyError in values(), and re-raise the proper exception in class ValuesView:

def __iter__(self):
    for key in self._mapping:
        try:
            yield self._mapping[key]
        except KeyError:
            raise RuntimeError("underlying Mapping instance modified during iteration")


I consider this a bit ugly, because it may mask other problems in a
custom Mapping implementation (that may raise a KeyError because of
a bug in the Mapping implementation itself).

(d) Extend the OrderedDict documentation to explicity document this case.

This has the drawback that it would probably be nicer to just have
the behavior be consistent and correspond to intuitive expectations.

Would any of those fixes be acceptable? Or is there an even better option?

Best, Nikolaus

-- Encrypted emails preferred. PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

         »Time flies like an arrow, fruit flies like a Banana.«


More information about the Python-Dev mailing list