[Python-Dev] email package status in 3.X (original) (raw)

Laurens Van Houtven lvh at laurensvh.be
Mon Jun 21 00:01:08 CEST 2010


On Sun, Jun 20, 2010 at 11:30 PM, Terry Reedy <tjreedy at udel.edu> wrote:

On 6/20/2010 8:26 AM, Giampaolo RodolĂ  wrote:

I attempted to port pyftpdlib to python 3 several times and the biggest show stopper has always been the bytes / string difference introduced by Python 3 which forces you to know and use Unicode every time you deal with some text and 2to3 is completely useless here. I believe the advice in the wiki porting page is to use unicode() and bytes() but never str(), in a version that runs in 2.6. Then 2to3 should do fine. For 2.5-, add 'bytes = str' somewhere.

Really? I thought you were supposed to call encode/decode methods on the appropriate thing, depending if they're coming from a byte source or a character source. The problems arise when you're doing things like paths, which I believe are bytes on *nix and proper Unicode on Windows (which basically just means they enforce an encoding, UTF-16 if I'm not mistaken). I don't actually use Windows so I might be completely wrong here.

2to3 still gets patches, I believe, when someone exhibits code that could and ought to be converted but is not.

I suspect that if you posted 'Problems porting pyftpdlib to Python3', you would get some help. If it involved inadequacies in the current tools and guides, it would to be be on-topic here. Or try python-list.

The choice of forcing the user to use Unicode and "think in Unicode" was a very brave one, and I'm sure it's for the better, but not everyone wants to deal with that because Unicode is hard to swallow. I felt that way until my daughter decided to switch from Spanish to Japanese for here foreign language. Once I quit fighting it, it because much easier to swallow and learn. As it turns out, thinking in Unicode is a pretty straightforward generalization of thinking in ascii. There are some annoying glitches due to the need to accomodate legacy systems. The plethora of legacy encodings for various subsets, besides ascii, is also a nuisance.

I think doing unicode/str properly in 2.x is very important, #python stresses it quite often, I think Py3k's strictness is a good idea because people very often write something that appears to work for a long time, and then someone tries it using funny bytes, and everything blows apart. Convincing people their software is wrong when "everything worked five minutes ago" is really hard :-)

You'd be surprised how long it can take before some of these problems are found, a couple of weeks ago in #python we had exactly this problem when we were helping Blender folks. There was a bug report from a German Blender user, turns out Blender ignores unicode in some critical spot making importing between people who disagree on charsets impossible. And Blender isn't exactly a project that's two weeks old and filled with idiots :) The downside is that fixing them then becomes a nontrivial task.

The central problem is probably that a lot of people don't understand Unicode. Recently I learned that even Tanenbaum got it wrong in his latest revision of the computer networks book! (Although that might just be my dutch translation of it being bad).

Terry Jan Reedy

Laurens



More information about the Python-Dev mailing list