[Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release] (original) (raw)

Barry Warsaw barry at python.org
Fri Sep 17 01:34:35 CEST 2010


On Sep 16, 2010, at 06:11 PM, Glyph Lefkowitz wrote:

That may be a handy way to deal with some grotty internal implementation details, but having a 'decode()' method is broken. The thing I care about, as a consumer of this API, is that there is a clearly defined "Message" interface, which gives me a uniform-looking place where I can ask for either characters (if I'm displaying them to the user) or bytes (if I'm putting them on the wire). I don't particularly care where those bytes came from. I don't care what decoding tricks were necessary to produce the characters.

But first you have to get to that Message interface. This is why the current email package separates parsing and generating from the representation model. You could conceivably have a parser that rot13's all the payload, or just parses the headers and leaves the payload as a blob of bytes. But the parser tries to be lenient in what it accepts, so that one bad header doesn't cause it to just punt on everything that follows. Instead, it parses what it can and registers a defect on that header, which the application can then reason about, because it has a Message object. If it were to just throw up its hands (i.e. raise an exception), you'd basically be left with a blob of useless crap that will just get /dev/null'd.

Now, it may be worthwhile to have specific normalization / debrokenifying methods which deal with specific types of corrupt data from the wire; encoding-guessing, replacement-character insertion or whatever else are fine things to try. It may also be helpful to keep around a list of errors in the message, for inspection. But as we know, there are lots of ways that MIME data can go bad other than encoding, so that's just one variety of error that we might want to keep around.

Right. The middle ground IMO is what the current parser does. It recognizes the problem, registers a defect, and tries to recover, but it doesn't fix the corrupt data. So for example, if you had a valid RFC 2047 encoded Subject but a broken X-Foo header, you'd at least still end up with a Message object. The value of the good headers would be things from which you can get the unicode value, the raw bytes value, parse its parameters, munge it, etc. while the bad header might be something you can only get the raw bytes from.

-Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: <http://mail.python.org/pipermail/python-dev/attachments/20100916/affcd946/attachment.pgp>



More information about the Python-Dev mailing list