[Python-Dev] email package status in 3.X (original) (raw)

Barry Warsaw barry at python.org
Mon Jun 21 17:43:07 CEST 2010


On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote:

Something that may make sense to ease the porting process is for some of these "on the boundary" I/O related string manipulation functions (such as os.path.join) to grow "encoding" keyword-only arguments. The recommended approach would be to provide all strings, but bytes could also be accepted if an encoding was specified. (If you want to mix encodings - tough, do the decoding yourself).

This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz for it.

Would it make sense to have "encoding-carrying" bytes and str types? Basically, I'm thinking of types (maybe even the current ones) that carry around a .encoding attribute so that they can be automatically encoded and decoded where necessary. This at least would simplify APIs that need to do the conversion.

By default, the .encoding attribute would be some marker to indicated "I have no idea, do it explicitly" and if you combine ebytes or estrs that have incompatible encodings, you'd either throw an exception or reset the .encoding to IAmConfuzzled. But say you had an email header like:

=?euc-jp?b?pc+l7aG8pe+hvKXrpcmhqg==?=

And code like the following (made less crappy):

-----snip snip----- class ebytes(bytes): encoding = 'ascii'

def __str__(self):
    s = estr(self.decode(self.encoding))
    s.encoding = self.encoding
    return s

class estr(str): encoding = 'ascii'

s = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa', 'euc-jp') b = bytes(s, 'euc-jp')

eb = ebytes(b) eb.encoding = 'euc-jp' es = str(eb) print(repr(eb), es, es.encoding) -----snip snip-----

Running this you get:

b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa' ハローワールド! euc-jp

Would it be feasible? Dunno. Would it help ease the bytes/str confusion? Dunno. But I think it would help make APIs easier to design and use because it would cut down on the encoding-keyword function signature infection.

-Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: <http://mail.python.org/pipermail/python-dev/attachments/20100621/10fd5d0f/attachment.pgp>



More information about the Python-Dev mailing list