[Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release] (original) (raw)

Ian Bicking ianb at colorstudy.com
Fri Sep 17 21:44:54 CEST 2010

Previous message: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
Next message: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Sep 17, 2010 at 3:25 PM, Michael Foord <fuzzyman at voidspace.org.uk>wrote:

On 16/09/2010 23:05, Antoine Pitrou wrote:

On Thu, 16 Sep 2010 16:51:58 -0400 "R. David Murray"<rdmurray at bitdance.com> wrote:

What do we store in the model? We could say that the model is always text. But then we lose information about the original bytes message, and we can't reproduce it. For various reasons (mailman being a big one), this is not acceptable. So we could say that the model is always bytes. But we want access to (for example) the header values as text, so header lookup should take string keys and return string values[2]. Why can't you have both in a single class? If you create the class using a bytes source (a raw message sent by SMTP, for example), the class automatically parses and decodes it to unicode strings; if you create the class using an unicode source (the text body of the e-mail message and the list of recipients, for example), the class automatically creates the bytes representation. I think something like this would be great for WSGI. Rather than focus on whether bytes or text should be used, use a higher level object that provides a bytes view, and (where possible/appropriate) a unicode view too.

This is what WebOb does; e.g., there is only bytes version of a POST body, and a view on that body that does decoding and encoding. If you don't touch something, it is never decoded or encoded. I only vaguely understand the specifics here, and I suspect the specifics matter, but this seems applicable in this case too -- if you have an incoming email with a smattering of bytes, inline (2047) encoding, other encoding declarations, and then orthogonal systems like quoted-printable, you don't want to touch that stuff if you don't need to as handling unicode objects implies you are normalizing the content, and that might have subtle impacts you don't know about, or don't want to know about, or maybe just don't fit into the unicode model (like a string with two character sets).

Note that WebOb does not have two views, it has only one view -- unicode viewing bytes. I'm not sure I could keep two views straight. I think Antoine is describing two possible canonical data types (unicode or bytes) and two views. That sounds hard.

-- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20100917/795f4524/attachment.html>

Previous message: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
Next message: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list