[Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release] (original) (raw)

R. David Murray rdmurray at bitdance.com
Thu Sep 16 17:30:12 CEST 2010


On Thu, 16 Sep 2010 09:52:48 -0400, Barry Warsaw <barry at python.org> wrote:

On Sep 16, 2010, at 11:28 PM, Nick Coghlan wrote: >There are some APIs that should be able to handle bytes or strings, >but the current use of string literals in their implementation means >that bytes don't work. This turns out to be a PITA for some networking >related code which really wants to be working with raw bytes (e.g. >URLs coming off the wire).

Note that email has exactly the same problem. A general solution -- even if embodied in well documented best-practices and convention -- would really help make the stdlib work consistently, and I bet third party libraries too.

Allowing bytes-in -> bytes-out where possible would definitely be a help (and Guido has endorsed this, IIUC), but some care has to be taken to understand the API contract of the method in question before blindly applying it. Are you "merely" allowing bytes to be processed as ASCII strings, or does processing the bytes correctly imply that you are converting from an ASCII encoding of text in order to process it? In Python2, the latter might not generate unicode yet still produce a correct result most of the time, but a big point of Python3 is to eliminate that "most of the time", so we need to be careful not to reintroduce it. This was all covered in the thread Nick refers to; I just want to emphasize that one needs to look at the API contract carefully before making it polymorphic (in Guido's sense of the term).

If the way to do this is well documented best practices, we first have to figure out what those best practices are. To do that we have to write some real-world code. I'm trying one approach in email6: Bytes and String subclasses, where the subclasses have an attribute named 'literals' derived from a utility module that does this:

literals = dict(
    empty = '',
    colon = ':',
    newline = '\n',
    space = ' ',
    tab = '\t',
    fws = ' \t',
    headersep = ': ',
    )

class _string_literals:
    pass
class _bytes_literals:
    pass

for name, value in literals.items():
    setattr(_string_literals, name, value)
    setattr(_bytes_literals, name, bytes(value, 'ASCII'))
del literals, name, value

And the subclasses do:

class BytesHeader(BaseHeader):
    lit = email.utils._bytes_literals

class StringHeader(BaseHeader):
    lit = email.utils._string_literals

And then BaseHeader uses self.lit.colon, etc, when manipulating strings. It also has to use slice notation rather than indexing when looking at individual characters, which is a PITA but not terrible.

I'm not saying this is the best approach, since this is all experimental code at the moment, but it is an approach....

-- R. David Murray www.bitdance.com



More information about the Python-Dev mailing list