[Python-3000] PEP 3131 accepted (original) (raw)
Jim Jewett jimjjewett at gmail.com
Wed May 23 23:25:00 CEST 2007
- Previous message: [Python-3000] PEP 3131 normalization forms
- Next message: [Python-3000] PEP 3131 accepted
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 5/23/07, Guido van Rossum <guido at python.org> wrote:
On 5/23/07, Jim Jewett <jimjjewett at gmail.com> wrote: > Certain cut-and-paste errors (such as cutting from a word document > that uses "smart quotes") will change from syntax errors to silently > creating new identifiers.
Really? Are those quote characters considered letters by the Unicode standard?
I'm not certain which specific character MS Word uses for smart quotes. My best guess is that it is actually "PRIVATE USE 1", which is supposed to be ignored (don't prevent it; just pretend it isn't there).
My fears were heightened by http://www.unicode.org/reports/tr31/tr31-8.html. They discuss NFKC canonicalization (though another tech report recommends NFKD. If you use NFKC, they say to modify it so that because U+0374 ( ʹ ) GREEK NUMERAL SIGN should not be allowed, but it folds to U+02B9 ( ʹ ) MODIFIER LETTER PRIME, which they claim should be allowed.
Within the codepoints < 256, if we ban rather than ignore, the only remaining problems are likely to be
(1) that we must add _ as an allowed ID start, and (2) we must decide whether or not to allow the recommended
00AA ; ID_Start # L& FEMININE ORDINAL INDICATOR 00B5 ; ID_Start # L& MICRO SIGN 00BA ; ID_Start # L& MASCULINE ORDINAL INDICATOR
(also in XID_START, and in the CONTINUE sets)
-jJ
- Previous message: [Python-3000] PEP 3131 normalization forms
- Next message: [Python-3000] PEP 3131 accepted
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]