[Python-Dev] Allowing non-ASCII identifiers (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Wed Jan 14 15:08:58 EST 2004


I'd like to work on adding support for non-ASCII characters in identifiers, using the following principles:

  1. At run-time, identifiers are represented as Unicode objects unless they are pure ASCII. IOW, they are converted from the source encoding to Unicode objects in the process of parsing.

  2. As a consequence of 1), all places there identifiers appear need to support Unicode objects (e.g. dict, getattr, etc)

  3. Legal non-ASCII identifiers are what legal non-ASCII identifiers are in Java, except that Python may use a different version of the Unicode character database. Python would share the property that future versions allow more characters in identifiers than older versions.

    If you are too lazy too look up the Java definition, here is a rough overview: An identifier is "JavaLetter JavaLetterOrDigit*"

    JavaLetter is a character of the classes Lu, Ll, Lt, Lm, or Lo, or a currency symbol (for Python: excluding $), or a connecting punctuation character (which is unfortunately underspecified - will research the implementation).

    JavaLetterOrDigit is a JavaLetter, or a digit, a numeric letter, a combining mark, a non-spacing mark, or an ignorable control character.

Does this need a PEP?

Regards, Martin



More information about the Python-Dev mailing list