[Python-Dev] Patch making the current email package (mostly) support bytes (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Sun Oct 3 04:43:19 CEST 2010


On Sun, Oct 3, 2010 at 9:00 AM, R. David Murray <rdmurray at bitdance.com> wrote:

I do not propose that this is a good API, since it has the classic problem that if there are coding bugs in the email module strings may "escape" that have surrogates in them and we end up with programs that work most of the time....except when they fail with mysterious errors because of unusual bytes input data.  On the other hand you always know when you have bytes data in an unknown encoding (because they are surrogate escaped), so it is ever so much better than the Python2 situation.

It's a similar concept to one Antoine and I (and some others) have been considering in the tracker for making urllib.parse able to handle ASCII-compatible bytes-encodings. I've already implemented a version of that patch which has parallel bytes and str versions of all the ASCII constants, and the result is pretty ugly. My next goal is to implement a version that uses the same trick you have here for email and see how the code complexity compares.

We do need to tread carefully to make sure the pseudo strings don't escape, but the other approach requires similar care all the way through the internal algorithms to make sure they aren't assuming bytes or str instances anywhere.

Cheers, Nick.

-- Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-Dev mailing list