[Python-3000] Unicode and OS strings (original) (raw)
Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Mon Sep 17 21:12:00 CEST 2007
- Previous message: [Python-3000] Unicode and OS strings
- Next message: [Python-3000] Unicode and OS strings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dnia 15-09-2007, So o godzinie 09:13 +0900, Stephen J. Turnbull napisaĆ(a):
> Well, for any scheme which attempts to modify UTF-8 by accepting > arbitrary byte strings is used, something must be interpreted > differently than in real UTF-8.
Wrong. In my scheme everything ends up in the PUA, on which real UTF-8 imposes no interpretation by definition.
This is wrong: UTF-8 is specified for PUA. PUA is no special from the point of view of UTF-8. UTF-8 is defined for all Unicode scalar values, i.e. all code points in the ranges U+0000..U+D7FF and U+E000..U+10FFFF, i.e. all code points excluding surrogates. This includes PUA.
I haven't gone back to check yet, but it's possible that a "real UTF-8 conforming process" is required to stop processing and issue an error or something like that in the cases we're trying to handle.
"C10. When a process interprets a code unit sequence which purports to be in a Unicode character encoding form, it shall treat ill-formed code unit sequences as an error condition and shall not interpret such sequences as characters."
-- _("< Marcin Kowalczyk _/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/
- Previous message: [Python-3000] Unicode and OS strings
- Next message: [Python-3000] Unicode and OS strings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]