[Python-3000] PEP: Supporting Non-ASCII Identifiers (original) (raw)

Eric V. Smith eric+python-dev at trueblade.com
Wed May 2 14:26:42 CEST 2007


Martin v. Löwis wrote: ...

Specification of Language Changes =================================

The syntax of identifiers in Python will be based on the Unicode standard annex UAX-31 [1], with elaboration and changes as defined below. Within the ASCII range (U+0001..U+007F), the valid characters for identifiers are the same as in Python 2.5. This specification only introduces additional characters from outside the ASCII range. For other characters, the classification uses the version of the Unicode Character Database as included in the unicodedata module. The identifier syntax is *. IDStart is defined as all characters having one of the general categories uppercase letters (Lu), lowercase letters (Ll), titlecase letters (Lt), modifier letters (Lm), other letters (Lo), letter numbers (Nl), plus the underscore (XXX what are "stability extensions listed in UAX 31). IDContinue is defined as all characters in IDStart, plus nonspacing marks (Mn), spacing combining marks (Mc), decimal number (Nd), and connector punctuations (Pc). All identifiers are converted into the normal form NFC while parsing; comparison of identifiers is based on NFC.

Martin:

I don't understand Unicode nearly well enough to really comment on this, but could you add a comment that the PEP3101 code might need to be adjusted to deal with Unicode identifiers?

I don't actually think your PEP would make any difference to how we're parsing, because we don't have a "is this a valid character for an identifier" function. But I'd like to get a note somewhere in the PEP saying that all code that parses for identifiers might be impacted. The PEP 3101 code is one place where we have such a parser. We'd at least need to implement tests for Unicode identifiers.

Which reminds me that we need better tests for the existing PEP 3101 code, especially for strings with surrogate pairs. I'll look at beefing that up.

Thanks.

Eric.



More information about the Python-3000 mailing list