[Python-Dev] len(chr(i)) = 2? (original) (raw)
Stephen J. Turnbull turnbull at sk.tsukuba.ac.jp
Tue Nov 23 17:16:55 CET 2010
- Previous message: [Python-Dev] len(chr(i)) = 2?
- Next message: [Python-Dev] len(chr(i)) = 2?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Martin v. Löwis" writes:
I disagree: Quoting from Unicode 5.0, section 5.4:
The individual components of implementations may have different
levels of support for surrogates, as long as those components are
assembled and communicate correctly.
"Assembly" is the problem. If chr() or a slice creates a lone surrogate and surrogateescape passes it back out, Python as a whole is non-conforming.
Technically, you can hide behind "none of slicing, chr(), or surrogateescape promises to conform", and maybe that would fly to a standards lawyer; I'd have to see the precise statement.
Here's a more convincing example. A user specifies "utf8" as her locale charset. Then she specifies a string containing a non-BMP character as the "description" of a file, and internal code munges this via slicing into a file name conforming to some specification (eg, length limit + uniquifier if needed). Then if the non-BMP character is in the "right" place, she will get either a broken file name, which will either get written to disk or raise an exception, depending on whether the munging program has enabled surrogateescape or not.
I claim both of those results are non-conforming to the specification of UTF-16, and therefore Python Unicode processing as a whole must be considered non-conforming.
It's still pretty damn good. But I've elaborated that point elsewhere.
The rationale for supporting these characters in chr() goes back much further than the surrogateescape handler - as Python unicode strings are sequences of code points, it would be impractical if you couldn't create some of them, or even would have to consult the UCD before determining whether they can be created.
The Zen is irrelevant to determining conformance to Unicode, which has its own Zen.
- Previous message: [Python-Dev] len(chr(i)) = 2?
- Next message: [Python-Dev] len(chr(i)) = 2?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]