[Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. (original) (raw)

BJörn Lindqvist bjourne at gmail.com
Wed May 9 21:54:46 CEST 2007


On 5/1/07, Phillip J. Eby <pje at telecommunity.com> wrote:

Comments and questions appreciated, as it'll help drive better explanations of both the design and rationales. I'm usually not that good at guessing what other people will want to know (or are likely to misunderstand) until I get actual questions.

I haven't read it all yet. But my first comment is "This PEP is HUGE!" 922 lines. Is there any way you could shorten it or split it up in more manageable chunks? My second comment is that there are to few examples in the PEP.

The API will be implemented in pure Python with no C, but may have some dependency on CPython-specific features such as sys.getframe and the funccode attribute of functions. It is expected that e.g. Jython and IronPython will have other ways of implementing similar functionality (perhaps using Java or C#).

Rationale and Goals =================== Python has always provided a variety of built-in and standard-library generic functions, such as len(), iter(), pprint.pprint(), and most of the functions in the operator module. However, it currently: 1. does not have a simple or straightforward way for developers to create new generic functions,

I think there is a very straightforward way. For example, a generic function for token handling could be written like this:

def handle_any(val):
    pass

def handle_tok(tok, val):
   handlers = {
       ANY        : handle_any,
       BRANCH     : handle_branch,
       CATEGORY   : handle_category
   }
   try:
       return handlers[tok](val)
   except KeyError, e:
       fmt = "Unsupported token type: %s"
       raise ValueError(fmt % tok)

This is an idiom I have used hundreds of times. The handle_tok function is generic because it dispatches to the correct handler based on the type of tok.

2. does not have a standard way for methods to be added to existing generic functions (i.e., some are added using registration functions, others require defining _special_ methods, possibly by monkeypatching), and

When does "external" code wants to add to a generic function? In the above example, you add to the generic function by inserting a new key-value pair in the handlers list. If needed, it wouldn't be very hard to make the handle_tok function extensible. Just make the handlers object global.

3. does not allow dispatching on multiple argument types (except in a limited form for arithmetic operators, where "right-hand" (_r*_) methods can be used to do two-argument dispatch.

Why would you want that?

The @overload decorator allows you to define alternate implementations of a function, specialized by argument type(s). A function with the same name must already exist in the local namespace. The existing function is modified in-place by the decorator to add the new implementation, and the modified function is returned by the decorator. Thus, the following code::

from overloading import overload from collections import Iterable def flatten(ob): """Flatten an object to its component iterables""" yield ob @overload def flatten(ob: Iterable): for o in ob: for ob in flatten(o): yield ob @overload def flatten(ob: basestring): yield ob creates a single flatten() function whose implementation roughly equates to:: def flatten(ob): if isinstance(ob, basestring) or not isinstance(ob, Iterable): yield ob else: for o in ob: for ob in flatten(o): yield ob except that the flatten() function defined by overloading remains open to extension by adding more overloads, while the hardcoded version cannot be extended.

I very much prefer the latter version. The reason is because the "locality of reference" is much worse in the overloaded version and because I have found it to be very hard to read and understand overloaded code in practice.

Let's say you find some code that looks like this:

def do_stuff(ob):
    yield obj

@overload
def do_stuff(ob : ClassA):
    for o in ob:
        for ob in do_stuff(o):
            yield ob

@overload
def do_stuff(ob : classb):
    yield ob

Or this:

def do_stuff(ob):
    if isinstance(ob, classb) or not isinstance(ob, ClassA):
        yield ob
    else:
        for o in ob:
            for ob in do_stuff(o):
                yield ob

With the overloaded code, you have to read EVERY definition of "do_stuff" to understand what the code does. Not just every definition in the same module, but every definition in the whole program because someone might have extended the do_stuff generic function.

What if they have defined a do_stuff that dispatch on ClassC that is a subclass of ClassA? Good luck in figuring out what the code does.

With the non-overloaded version you also have the ability to insert debug print statements to figure out what happens.

For example, if someone wants to use flatten() with a string-like type that doesn't subclass basestring, they would be out of luck with the second implementation. With the overloaded implementation, however, they can either write this::

@overload def flatten(ob: MyString): yield ob or this (to avoid copying the implementation):: from overloading import RuleSet RuleSet(flatten).copyrules((basestring,), (MyString,))

That may be great for flexibility, but I contend that it is awful for reality. In reality, it would be much simpler and more readable to just rewrite the flatten method:

def flatten(ob):
    flat = (isinstance(ob, (basestring, MyString)) or
            not isinstance(ob, Iterable))
    if flat:
        yield ob
    else:
        for o in ob:
            for ob in flatten(o):
                yield ob

Or change MyString so that it derives from basestring.

Most of the functionality described in this PEP is already implemented in the in-development version of the PEAK-Rules framework. In particular, the basic overloading and method combination framework (minus the @overload decorator) already exists there. The implementation of all of these features in peak.rules.core is 656 lines of Python at this writing.

I think PEAK is a great framework and that generic functions are great for those who likes it. But I'm not convinced that writing multiple dispatch functions the way PEAK prescribes is better than the any of the currently used idioms.

I first encountered them when I tried fix a bug in the jsonify.py module in TurboGears (now relocated to the TurboJSON package). It took me about 30 minutes to figure out how it worked (including manual reading). Had not PEAK style generic functions been used, it would have taken me 2 minutes top.

So IMHO, generic functions certainly are useful for some things, but not useful enough. Using them as a replacement for ordinary multiple dispatch techniques is a bad idea.

-- mvh Björn



More information about the Python-3000 mailing list