[Python-3000] Unicode and OS strings (original) (raw)
Guido van Rossum guido at python.org
Tue Sep 18 23:29:41 CEST 2007
- Previous message: [Python-3000] Unicode and OS strings
- Next message: [Python-3000] Unicode and OS strings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 9/18/07, Stephen J. Turnbull <stephen at xemacs.org> wrote:
Guido has stated that the internal representation used by Python strings is a sequence of Unicode code units, not characters. I don't think that's reached the status of "pronouncement" yet, but you will probably need a PEP to get the guarantees you want.
I think of this as cast in stone; we can't reasonably guarantee more if we want to be compatible with the UTF-16 (*) Unicode representations used on Windows and in Java. How much more pronouncement do you want?
(*) I'm not at all sure that it's called that -- you guys keep asking trick questions based on terminology that's only clear to people who have read the Unicode standard several times forwards and backwards. I mean the representation that uses 16-bit values, where characters >= 2**16 are represented as two 16-bit "surrogate" values. (I hope I at least have the 'surrogate' thing right this time.)
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-3000] Unicode and OS strings
- Next message: [Python-3000] Unicode and OS strings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]