[Python-Dev] Multilingual programming article on the Red Hat Developer blog (original) (raw)
Stephen J. Turnbull turnbull at sk.tsukuba.ac.jp
Fri Sep 12 05:28:54 CEST 2014
- Previous message: [Python-Dev] Multilingual programming article on the Red Hat Developer blog
- Next message: [Python-Dev] Multilingual programming article on the Red Hat Developer blog
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jeff Allen writes:
A welcome article. One correction should be made, I believe: the area of code point space used for the smuggling of bytes under PEP-383 is not a "Unicode Private Use Area", but a portion of the trailing surrogate range.
Nice catch. Note that the surrogate range was originally part of the Private Use Area, but it was carved out with the adoption of UTF-16 in about 1993. In practice, I doubt that there are any current implementations claiming compatibility with Unicode 1.0 (IIRC, UTF-16 was made mandatory in Unicode 1.1).
This is a code violation, which I imagine is why "surrogateescape" is an error handler, not a codec.
Yes.
I believe the private use area was considered and rejected for PEP-383. In an implementation of the type unicode based on UTF-16 (Jython), lone surrogates preclude a naive use of the platform string library. This is on my mind at the moment as I'm working several bugs in Jython's unicode type, and can see why it has been too difficult.
I've always thought that the "right" way to handle the private use area for "platforms" like Python and Emacs, which may need to use it for their own purposes (such as "undecodable bytes") but want to respect its use by applications, is to create an auxiliary table mapping the private use area to objects describing the characters represented by the private use code points. These objects would have attributes such as external representation for text I/O, glyph (for GUI display), repr (for TTY display), various Unicode properties, etc.
- Previous message: [Python-Dev] Multilingual programming article on the Red Hat Developer blog
- Next message: [Python-Dev] Multilingual programming article on the Red Hat Developer blog
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]