[Python-Dev] unifying str and unicode (original) (raw)

Antoine Pitrou solipsis at pitrou.net
Mon Oct 3 19:39:57 CEST 2005


Hi,

Josiah:

> How can you be sure that something that is /semantically textual/ will > always remain "pure ASCII" ? That's contradictory, unless your software > never goes out of the anglo-saxon world (and even...).

Non-unicode text input widgets.

You didn't understand my statement. I didn't mean :

Of course the answer to the latter is: you can't.

Fredrik:

Under the default encoding (and quite a few other encodings), that's true for plain ascii strings and Unicode strings.

If I have an unicode string containing legal characters greater than 0x7F, and I pass it to a function which converts it to str, the conversion fails.

If I have an 8-bit string containing legal non-ascii characters in it (for example the name of a file as returned by the filesystem, which I of course have no prior control on), and I give it to a function which does an implicit conversion to unicode, the conversion fails.

Here is an example so that you really understand. I am under a French locale (iso-8859-15), let's just try to enter a French word and see what happens when converting to unicode:

-> As a string constant:

s = "été" s '\xe9t\xe9' u = unicode(s) Traceback (most recent call last): File "", line 1, in ? UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

-> By asking for input:

s = rawinput() été s '\xe9t\xe9' unicode(s) Traceback (most recent call last): File "", line 1, in ? UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

It should work, but it fails miserably.

In the current situation, if the programmer doesn't carefully plan for these cases by manually managing conversions (which of course he can do

(even the standard Python library is bitten: witness the weird getcwd() / getcwdu() pair...)

I find it surprising that you claim there is no difficulty when everything points to the contrary. See for example how often confused developers ask for help on mailing-lists...

Regards

Antoine.



More information about the Python-Dev mailing list