[Python-3000] Unicode and OS strings (original) (raw)

Greg Ewing greg.ewing at canterbury.ac.nz
Fri Sep 14 07:08:04 CEST 2007


Stephen J. Turnbull wrote:

You can't win that, because Unicode is the only encoding that attempts to guarantee even the possibility of round-tripping.

Rubbish -- I can do print [ord(c) for c in my_unicode_string] and get perfect round-trippability if I want.

You can ask people to use pre-existing officially-sanctioned encodings for their unicode data, but you can't force them to.

The main problem with this scheme that I know of is that if you have a Python string that contains such a code point, you'll need to somehow include the information about the original encoding when pickling and the like.

That's exactly the sort of thing I'm talking about. It would be surprising if pickling worked reliably for all strings except ones that happened to come in as a command line argument.

-- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+



More information about the Python-3000 mailing list