[Python-3000] PEP 3119 - Introducing Abstract Base Classes (original) (raw)

Guido van Rossum guido at python.org
Thu Apr 26 20:50:59 CEST 2007


After a fair amount of pre-discussion, I'm ready for the first official review of this PEP. The PEP is online at http://www.python.org/dev/peps/pep-3119/ . Here's a summary of open issues on which I could use more help (more details in the full text of the PEP below):

Of course, feel free to discuss any issues not marked as "open" as well.

Full text of the PEP:

PEP: 3119 Title: Introducing Abstract Base Classes Version: Revision:54986Revision: 54986 Revision:54986 Last-Modified: Date:2007−04−2611:24:07−0700(Thu,26Apr2007)Date: 2007-04-26 11:24:07 -0700 (Thu, 26 Apr 2007) Date:2007042611:24:070700(Thu,26Apr2007) Author: Guido van Rossum <guido at python.org>, Talin <talin at acm.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 18-Apr-2007 Post-History: 26-Apr-2007

Abstract

This is a proposal to add Abstract Base Class (ABC) support to Python 3000. It proposes:

Much of the thinking that went into the proposal is not about the specific mechanism of ABCs, as contrasted with Interfaces or Generic Functions (GFs), but about clarifying philosophical issues like "what makes a set", "what makes a mapping" and "what makes a sequence".

Acknowledgements

Talin wrote the Rationale below [1]_ as well as most of the section on ABCs vs. Interfaces. For that alone he deserves co-authorship. The rest of the PEP uses "I" referring to the first author.

Rationale

In the domain of object-oriented programming, the usage patterns for interacting with an object can be divided into two basic categories, which are 'invocation' and 'inspection'.

Invocation means interacting with an object by invoking its methods. Usually this is combined with polymorphism, so that invoking a given method may run different code depending on the type of an object.

Inspection means the ability for external code (outside of the object's methods) to examine the type or properties of that object, and make decisions on how to treat that object based on that information.

Both usage patterns serve the same general end, which is to be able to support the processing of diverse and potentially novel objects in a uniform way, but at the same time allowing processing decisions to be customized for each different type of object.

In classical OOP theory, invocation is the preferred usage pattern, and inspection is actively discouraged, being considered a relic of an earlier, procedural programming style. However, in practice this view is simply too dogmatic and inflexible, and leads to a kind of design rigidity that is very much at odds with the dynamic nature of a language like Python.

In particular, there is often a need to process objects in a way that wasn't anticipated by the creator of the object class. It is not always the best solution to build in to every object methods that satisfy the needs of every possible user of that object. Moreover, there are many powerful dispatch philosophies that are in direct contrast to the classic OOP requirement of behavior being strictly encapsulated within an object, examples being rule or pattern-match driven logic.

On the the other hand, one of the criticisms of inspection by classic OOP theorists is the lack of formalisms and the ad hoc nature of what is being inspected. In a language such as Python, in which almost any aspect of an object can be reflected and directly accessed by external code, there are many different ways to test whether an object conforms to a particular protocol or not. For example, if asking 'is this object a mutable sequence container?', one can look for a base class of 'list', or one can look for a method named 'getitem'. But note that although these tests may seem obvious, neither of them are correct, as one generates false negatives, and the other false positives.

The generally agreed-upon remedy is to standardize the tests, and group them into a formal arrangement. This is most easily done by associating with each class a set of standard testable properties, either via the inheritance mechanism or some other means. Each test carries with it a set of promises: it contains a promise about the general behavior of the class, and a promise as to what other class methods will be available.

This PEP proposes a particular strategy for organizing these tests known as Abstract Base Classes, or ABC. ABCs are simply Python classes that are added into an object's inheritance tree to signal certain features of that object to an external inspector. Tests are done using isinstance(), and the presence of a particular ABC means that the test has passed.

In addition, the ABCs define a minimal set of methods that establish the characteristic behavior of the type. Code that discriminates objects based on their ABC type can trust that those methods will always be present. Each of these methods are accompanied by an generalized abstract semantic definition that is described in the documentation for the ABC. These standard semantic definitions are not enforced, but are strongly recommended.

Like all other things in Python, these promises are in the nature of a gentlemen's agreement, which in this case means that while the language does enforce some of the promises made in the ABC, it is up to the implementer of the concrete class to insure that the remaining ones are kept.

Specification

The specification follows the categories listed in the abstract:

ABC Support Framework

We define a new built-in decorator, @abstractmethod, to be used to declare abstract methods. A class containing at least one method declared with this decorator that hasn't been overridden yet cannot be instantiated. Such a methods may be called from the overriding method in the subclass (using super or direct invocation). For example::

class A:
    @abstractmethod
    def foo(self): pass

A()  # raises TypeError

class B(A):
    pass

B()  # raises TypeError

class C(A):
    def foo(self): print(42)

C()  # works

Note: The @abstractmethod decorator should only be used inside a class body. Dynamically adding abstract methods to a class, or attempting to modify the abstraction status of a method or class once it is created, are not supported.

Implementation: The @abstractmethod decorator sets the function attribute __isabstractmethod__ to the value True. The type.__new__ method computes the type attribute __abstractmethods__ as the set of all method names that have an __isabstractmethod__ attribute whose value is true. It does this by combining the __abstractmethods__ attributes of the base classes, adding the names of all methods in the new class dict that have a true __isabstractmethod__ attribute, and removing the names of all methods in the new class dict that don't have a true __isabstractmethod__ attribute. If the resulting __abstractmethods__ set is non-empty, the class is considered abstract, and attempts to instantiate it will raise TypeError. (CPython can uses an internal flag Py_TPFLAGS_ABSTRACT to speed up this check [6]_.)

Discussion: Unlike C++ or Java, abstract methods as defined here may have an implementation. This implementation can be called via the super mechanism from the class that overrides it. This could be useful as an end-point for a super-call in framework using a cooperative multiple-inheritance [7], [8].

ABCs for Containers and Iterators

The collections module will define ABCs necessary and sufficient to work with sets, mappings, sequences, and some helper types such as iterators and dictionary views.

The ABCs provide implementations of their abstract methods that are technically valid but fairly useless; e.g. __hash__ returns 0, and __iter__ returns an empty iterator. In general, the abstract methods represent the behavior of an empty container of the indicated type.

Some ABCs also provide concrete (i.e. non-abstract) methods; for example, the Iterator class has an __iter__ method returning itself, fulfilling an important invariant of iterators (which in Python 2 has to be implemented anew by each iterator class).

No ABCs override __init__, __new__, __str__ or __repr__. Defining a standard constructor signature would unnecessarily constrain custom container types, for example Patricia trees or gdbm files. Defining a specific string representation for a collection is similarly left up to individual implementations.

Ordering ABCs '''''''''''''

These ABCs are closer to object in the ABC hierarchy.

PartiallyOrdered This ABC defines the 4 inequality operations <, <=, >=, >. (Note that == and != are defined by object.) Classes deriving from this ABC should implement a partial order as defined in mathematics. [9]_

TotallyOrdered This ABC derives from PartiallyOrdered. It adds no new operations but implies a promise of stronger invariants. Classes deriving from this ABC should implement a total order as defined in mathematics. [10]_

Open issues: Where should these live? The collections module doesn't seem right, but making them built-ins seems a slippery slope too.

One Trick Ponies ''''''''''''''''

These abstract classes represent single methods like __iter__ or __len__.

Hashable The base class for classes defining __hash__. The __hash__ method should return an Integer (see "Numbers" below). The abstract __hash__ method always returns 0, which is a valid (albeit inefficient) implementation. Invariant: If classes C1 and C2 both derive from Hashable, the condition o1 == o2 must imply hash(o1) == hash(o2) for all instances o1 of C1 and all instances o2 of C2. IOW, two objects shouldn't compare equal but have different hash values.

Another constraint is that hashable objects, once created, should
never change their value (as compared by ``==``) or their hash
value.  If a class cannot guarantee this, it should not derive
from ``Hashable``; if it cannot guarantee this for certain
instances only, ``__hash__`` for those instances should raise a
``TypeError`` exception.

**Note:** being an instance of this class does not imply that an
object is immutable; e.g. a tuple containing a list as a member is
not immutable; its ``__hash__`` method raises ``TypeError``.

Iterable The base class for classes defining __iter__. The __iter__ method should always return an instance of Iterator (see below). The abstract __iter__ method returns an empty iterator.

Iterator The base class for classes defining __next__. This derives from Iterable. The abstract __next__ method raises StopIteration. The concrete __iter__ method returns self.

Sized The base class for classes defining __len__. The __len__ method should return an Integer (see "Numbers" below) >= 0. The abstract __len__ method returns 0. Invariant: If a class C derives from Sized as well as from Iterable, the invariant sum(1 for x in o) == len(o) should hold for any instance o of C.

Container The base class for classes defining __contains__. The __contains__ method should return a bool. The abstract __contains__ method returns False. Invariant: If a class C derives from Container as well as from Iterable, then (x in o for x in o) should be a generator yielding only True values for any instance o of C.

**Note:** strictly speaking, there are three variants of this method's
semantics.  The first one is for sets and mappings, which is fast:
O(1) or O(log N).  The second one is for membership checking on
sequences, which is slow: O(N).  The third one is for subsequence
checking on (character or byte) strings, which is also slow: O(N).
Would it make sense to distinguish these?  The signature of the
third variant is different, since it takes a sequence (typically
of the same type as the method's target) intead of an element.
For now, I'm using the same type for all three.  This means that
is is possible for ``x in o`` to be True even though ``x`` is
never yielded by ``iter(o)``.  A suggested name for the third form
is ``Searchable``.

Sets ''''

These abstract classes represent various stages of "set-ness". The most fundamental set operation is the membership test, written as x in s and implemented by s.__contains__(x). This is already taken care of by the `Container`` class defined above. Therefore, we define a set as a sized, iterable container for which certain invariants from mathematical set theory hold.

The built-in type set derives from MutableSet. The built-in type frozenset derives from HashableSet.

You might wonder why we require a set to be sized -- surely certain infinite sets can be represented just fine in Python. For example, the set of even integers could be defined like this::

class EvenIntegers(Container):
    def __contains__(self, x):
        return x % 2 == 0

However, such sets have rather limited practical value, and deciding whether one such set is a subset of another would be difficult in general without using a symbolic algebra package. So I consider this out of the scope of a pragmatic proposal like this.

Set This is a sized, iterable, partially ordered container, i.e. a subclass of Sized, Iterable, Container and PartiallyOrdered. Not every subset of those three classes is a set though! Sets have the additional invariant that each element occurs only once (as can be determined by iteration), and in addition sets define concrete operators that implement the inequality operations as subclass/superclass tests. In general, the invariants for finite sets in mathematics hold. [11]_

Sets with different implementations can be compared safely,
(usually) efficiently and correctly using the mathematical
definitions of the subclass/superclass operations for finite sets.
The ordering operations have concrete implementations; subclasses
may override these for speed but should maintain the semantics.
Because ``Set`` derives from ``Sized``, ``__eq__`` may take a
shortcut and returns ``False`` immediately if two sets of unequal
length are compared.  Similarly, ``__le__`` may return ``False``
immediately if the first set has more members than the second set.
Note that set inclusion implements only a partial ordering;
e.g. ``{1, 2}`` and ``{1, 3}`` are not ordered (all three of
``<``, ``==`` and ``>`` return ``False`` for these arguments).
Sets cannot be ordered relative to mappings or sequences, but they
can be compared to those for equality (and then they always
compare unequal).

**Note:** the ``issubset`` and ``issuperset`` methods found on the
set type in Python 2 are not supported, as these are mostly just
aliases for ``__le__`` and ``__ge__``.

**Open issues:** should we define comparison of instances of
different concrete set types this way?

ComposableSet This is a subclass of Set that defines abstract operators to compute union, intersection, symmetric and asymmetric difference, respectively __or__, __and__, __xor__ and __sub__. These operators should return instances of ComposableSet. The abstract implementations return no meaningful values but raise NotImplementedError; this is because any generic implementation would have to create new instances but the ABCs don't (and shouldn't, IMO) provide an API for creating new instances. The implementations of these operators should ensure that the results match the mathematical definition of set composition. [11]_

**Open issues:** Should ``__or__`` and friends be abstract or
concrete methods?  Making them abstract means that every
ComposableSet implementation must reimplement all of them.  But
making them concrete begs the question of the actual return type:
since the ABC doesn't (and IMO shouldn't) define the constructor
signature for subclasses, the concrete implementations in the ABC
don't have an API to construct a new instance given an iterable.
Perhaps the right choice is to have a static concrete factory
function ``fromiterable`` which takes an iterable and returns
a ``ComposableSet`` instance.  Subclasses can override this and
benefit from the default implementations of ``__or__`` etc.; or
they can override ``__or__`` if they want to.

HashableSet This is a subclass of both ComposableSet and Hashable. It implements a concrete __hash__ method that subclasses should not override; or if they do, the subclass should compute the same hash value. This is so that sets with different implementations still hash to the same value, so they can be used interchangeably as dictionary keys. (A similar constraint exists on the hash values for different types of numbers and strings.)

**Open issues:** Spell out the hash algorithm.  Should there be
another ABC that derives from Set and Hashable, but not from
Composable?

MutableSet This is a subclass of ComposableSet implementing additional operations to add and remove elements. The supported methods have the semantics known from the set type in Python 2 (except for discard, which is modeled after Java):

``.add(x)``
    Abstract method returning a ``bool`` that adds the element
    ``x`` if it isn't already in the set.  It should return
    ``True`` if ``x`` was added, ``False`` if it was already
    there. The abstract implementation raises
    ``NotImplementedError``.

``.discard(x)``
    Abstract method returning a ``bool`` that removes the element
    ``x`` if present.  It should return ``True`` if the element
    was present and ``False`` if it wasn't.  The abstract
    implementation raises ``NotImplementedError``.

``.pop()``
    Concrete method that removes an arbitrary item.  If the set is
    empty, it raises ``KeyError``.  The default implementation
    removes the first item returned by the set's iterator.

``.toggle(x)``
    Concrete method returning a ``bool`` that adds x to the set if
    it wasn't there, but removes it if it was there.  It should
    return ``True`` if ``x`` was added, ``False`` if it was
    removed.

``.clear()``
    Concrete method that empties the set.  The default
    implementation repeatedly calls ``self.pop()`` until
    ``KeyError`` is caught.  (**Note:** this is likely much slower
    than simply creating a new set, even if an implementation
    overrides it with a faster approach; but in some cases object
    identity is important.)

This also supports the in-place mutating operations ``|=``,
``&=``, ``^=``, ``-=``.  These are concrete methods whose right
operand can be an arbitrary ``Iterable``, except for ``&=``, whose
right operand must be a ``Container``.  This ABC does not support
the named methods present on the built-in concrete ``set`` type
that perform (almost) the same operations.

Mappings ''''''''

These abstract classes represent various stages of mapping-ness. The Mapping class represents the most common read-only mapping API. However, code accepting a mapping is encouraged to check for the BasicMapping ABC when iteration is not used. This allows for certain "black-box" implementations that can look up values by key but don't provide a convenient iteration API. A hypothetical example would be an interface to a hierarchical filesystem, where keys are pathnames relative to some root directory. Iterating over all pathnames would presumably take forever, as would counting the number of valid pathnames.

The built-in type dict derives from MutableMapping.

BasicMapping A subclass of Container defining the following methods:

``.__getitem__(key)``
    Abstract method that returns the value corresponding to
    ``key``, or raises ``KeyError``.  The implementation always
    raises ``KeyError``.

``.get(key, default=None)``
    Concrete method returning ``self[key]`` if this does not raise
    ``KeyError``, and the ``default`` value if it does.

``.__contains__()``
    Concrete method returning ``True`` if ``self[key]`` does not
    raise ``KeyError``, and ``False`` if it does.

Mapping A subclass of BasicMapping, Iterable and Sized. The keys of a mapping naturally form a set. The (key, value) pairs are also referred to as items. The items also form a set. Methods:

``__len__``
    Abstract method returning the length of the key set.

``__iter__``
    Abstract method returning each key in the key set exactly once.

``__eq__``
    Concrete method for comparing mappings.  Two mappings, even
    with different implementations, can be compared for equality,
    and are considered equal if and only iff their item sets are
    equal.  **Open issues:** should we define comparison of
    instances of different concrete mapping types this way?

``keys``
    Concrete method returning the key set as a ``Set``.  The
    default concrete implementation returns a "view" on the key
    set (meaning if the underlying mapping is modified, the view's
    value changes correspondingly); subclasses are not required to
    return a view but they should return a ``Set``.

``items``
    Concrete method returning the items as a ``Set``.  The default
    concrete implementation returns a "view" on the item set;
    subclasses are not required to return a view but they should
    return a ``Set``.

``values``
    Concrete method returning the values as a sized, iterable
    container (not a set!).  The default concrete implementation
    returns a "view" on the values of the mapping; subclasses are
    not required to return a view but they should return a sized,
    iterable container.

The following invariant should hold for any mapping ``m``::

    set(m.items()) == set(zip(m.keys(), m.values()))

i.e. iterating over the keys and the values in parallel should
return *corresponding* keys and values.  **Open issues:** Should
this always be required?  How about the stronger invariant using
``list()`` instead of ``set()``?

HashableMapping A subclass of Mapping and Hashable. The values should be instances of Hashable. The concrete __hash__ method should be equal to hash(m.items()).

MutableMapping A subclass of Mapping that also implements some standard mutating methods. Abstract methods include __setitem__, __delitem__. Concrete methods include pop, popitem, clear, update. Note: setdefault is not included. Open issues: Write out the specs for the methods.

Sequences '''''''''

These abstract classes represent various stages of sequence-ness.

The built-in list and bytes types derive from MutableSequence. The built-in tuple and str types derive from HashableSequence.

Sequence A subclass of Iterable, Sized, Container. It defines a new abstract method __getitem__ that has a somewhat complicated signature: when called with an integer, it returns an element of the sequence or raises IndexError; when called with a slice object, it returns another Sequence. The concrete __iter__ method iterates over the elements using __getitem__ with integer arguments 0, 1, and so on, until IndexError is raised. The length should be equal to the number of values returned by the iterator.

**Open issues:** Other candidate methods, which can all have
default concrete implementations that only depend on ``__len__``
and ``__getitem__`` with an integer argument: __reversed__, index,
count, __add__, __mul__, __eq__, __lt__, __le__.

HashableSequence A subclass of Sequence and Hashable. The concrete __hash__ method should implements the hashing algorithms used by tuples in Python 2.

MutableSequence A subclass of Sequence adding some standard mutating methods. Abstract mutating methods: __setitem__ (for integer indices as well as slices), __delitem__ (ditto), insert, append, reverse. Concrete mutating methods: extend, pop, remove. Concrete mutating operators: +=, *= (these mutate the object in place). Note: this does not define sort() -- that is only required to exist on genuine list instances.

Open issues: If all the elements of a sequence are totally ordered, the sequence itself can be totally ordered with respect to other sequences containing corresponding items of the same type. Should we reflect this by making Sequence derive from TotallyOrdered? Or Partiallyordered? Also, we could easily define comparison of sequences of different types, so that e.g. (1, 2, 3) == [1, 2, 3] and (1, 2) < [1, 2, 3]. Should we? (It might imply ["a", "b"] == "ab" and [1, 2] == b"\1\2".)

Strings

Python 3000 has two built-in string types: byte strings (bytes), deriving from MutableSequence, and (Unicode) character strings (str), deriving from HashableSequence. They also derive from TotallyOrdered. If we were to introduce Searchable, they would also derive from that.

Open issues: define the base interfaces for these so alternative implementations and subclasses know what they are in for. This may be the subject of a new PEP or PEPs (PEP 358 should be co-opted for the bytes type).

Numbers

ABCs for numerical types are defined in PEP 3141.

Guidelines for Writing ABCs

Some suggestions for writing ABCs:

ABCs vs. Alternatives

In this section I will attempt to compare and contrast ABCs to other approaches that have been proposed.

ABCs vs. Duck Typing

Does the introduction of ABCs mean the end of Duck Typing? I don't think so. Python will not require that a class derives from BasicMapping or Sequence when it defines a __getitem__ method, nor will the x[y] syntax require that x is an instance of either ABC. You will still be able to assign any "file-like" object to sys.stdout, as long as it has a write method.

Of course, there will be some carrots to encourage users to derive from the appropriate base classes; these vary from default implementations for certain functionality to an improved ability to distinguish between mappings and sequences. But there are no sticks. If hasattr(x, __len__) works for you, great! ABCs are intended to solve problems that don't have a good solution at all in Python 2, such as distinguishing between mappings and sequences.

ABCs vs. Generic Functions

ABCs are compatible with Generic Functions (GFs). For example, my own Generic Functions implementation [4]_ uses the classes (types) of the arguments as the dispatch key, allowing derived classes to override base classes. Since (from Python's perspective) ABCs are quite ordinary classes, using an ABC in the default implementation for a GF can be quite appropriate. For example, if I have an overloaded prettyprint function, it would make total sense to define pretty-printing of sets like this::

@prettyprint.register(Set)
def pp_set(s):
    return "{" + ... + "}"  # Details left as an exercise

and implementations for specific subclasses of Set could be added easily.

I believe ABCs also won't present any problems for RuleDispatch, Phillip Eby's GF implementation in PEAK [5]_.

Of course, GF proponents might claim that GFs (and concrete, or implementation, classes) are all you need. But even they will not deny the usefulness of inheritance; and one can easily consider the ABCs proposed in this PEP as optional implementation base classes; there is no requirement that all user-defined mappings derive from BasicMapping.

ABCs vs. Interfaces

ABCs are not intrinsically incompatible with Interfaces, but there is considerable overlap. For now, I'll leave it to proponents of Interfaces to explain why Interfaces are better. I expect that much of the work that went into e.g. defining the various shades of "mapping-ness" and the nomenclature could easily be adapted for a proposal to use Interfaces instead of ABCs.

"Interfaces" in this context refers to a set of proposals for additional metadata elements attached to a class which are not part of the regular class hierarchy, but do allow for certain types of inheritance testing.

Such metadata would be designed, at least in some proposals, so as to be easily mutable by an application, allowing application writers to override the normal classification of an object.

The drawback to this idea of attaching mutable metadata to a class is that classes are shared state, and mutating them may lead to conflicts of intent. Additionally, the need to override the classification of an object can be done more cleanly using generic functions: In the simplest case, one can define a "category membership" generic function that simply returns False in the base implementation, and then provide overrides that return True for any classes of interest.

References

.. [1] An Introduction to ABC's, by Talin (http://mail.python.org/pipermail/python-3000/2007-April/006614.html)

.. [2] Incomplete implementation prototype, by GvR (http://svn.python.org/view/sandbox/trunk/abc/)

.. [3] Possible Python 3K Class Tree?, wiki page created by Bill Janssen (http://wiki.python.org/moin/AbstractBaseClasses)

.. [4] Generic Functions implementation, by GvR (http://svn.python.org/view/sandbox/trunk/overload/)

.. [5] Charming Python: Scaling a new PEAK, by David Mertz (http://www-128.ibm.com/developerworks/library/l-cppeak2/)

.. [6] Implementation of @abstractmethod (http://python.org/sf/1706989)

.. [7] Unifying types and classes in Python 2.2, by GvR (http://www.python.org/download/releases/2.2.3/descrintro/)

.. [8] Putting Metaclasses to Work: A New Dimension in Object-Oriented Programming, by Ira R. Forman and Scott H. Danforth (http://www.amazon.com/gp/product/0201433052)

.. [9] Partial order, in Wikipedia (http://en.wikipedia.org/wiki/Partial_order)

.. [10] Total order, in Wikipedia (http://en.wikipedia.org/wiki/Total_order)

.. [11] Finite set, in Wikipedia (http://en.wikipedia.org/wiki/Finite_set)

Copyright

This document has been placed in the public domain.

.. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-3000 mailing list