[Python-Dev] email package status in 3.X (original) (raw)

Antoine Pitrou solipsis at pitrou.net
Sun Jun 20 19:55:47 CEST 2010


On Sun, 20 Jun 2010 14:26:28 +0200 Giampaolo RodolĂ  <g.rodola at gmail.com> wrote:

I attempted to port pyftpdlib to python 3 several times and the biggest show stopper has always been the bytes / string difference introduced by Python 3 which forces you to know and use Unicode every time you deal with some text and 2to3 is completely useless here.

I don't really understand what the difficulties are. A character is a character; to convert from bytes to characters needs to know the encoding, which your protocol should specify somewhere (of course, I suppose FTP is old and crummy enough that it may not specify anything).

An "encoding" is nothing more than a transformation. When you get gzipped data, you must decompress it before doing anything useful out of it. Similarly, when you get (say) UTF-8 data, you must decode it before doing anything useful out of it.

I can only imagine how difficult can it be to do such a conversion in a project like Twisted or Django where the I/O plays a fundamental role.

Twisted actually seems to enforce the bytes / unicode separation quite well already, so I don't think they should have many problems on that front. Modern Web frameworks seem to be in the same boat (they already give the Web developer unicode strings to play with, and handle the encoding/decoding at the IO boundary transparently).

The choice of forcing the user to use Unicode and "think in Unicode" was a very brave one, and I'm sure it's for the better, but not everyone wants to deal with that because Unicode is hard to swallow.

Could Google fund a project named "Unicode Swallow"?

Regards

Antoine.



More information about the Python-Dev mailing list