[Python-Dev] Is this safe enough? Re: [Python-checkins] cpython: _Py_Identifier are always ASCII strings (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Tue Feb 7 09:23:34 CET 2012


PyIDENTIFIER(xxx) defines a variable called PyIdxxx, so xxx can only be ASCII: the C language doesn't accept non-ASCII identifiers.

That's not exactly true. In C89, source code is in the "source character set", which is implementation-defined, except that it must contain the "basic character set". I believe that it allows for implementation-defined characters in identifiers. In C99, this is extended to include "universal character names" (\u escapes). They may appear in identifiers as long as the characters named are listed in annex D.59 (which I cannot locate).

In C 2011, annexes D.1 and D.2 specify the characters that you can use in an identifier:

D.1 Ranges of characters allowed

  1. 00A8, 00AA, 00AD, 00AF, 00B2−00B5, 00B7−00BA, 00BC−00BE, 00C0−00D6, 00D8−00F6, 00F8−00FF
  2. 0100−167F, 1681−180D, 180F−1FFF
  3. 200B−200D, 202A−202E, 203F−2040, 2054, 2060−206F
  4. 2070−218F, 2460−24FF, 2776−2793, 2C00−2DFF, 2E80−2FFF
  5. 3004−3007, 3021−302F, 3031−303F
  6. 3040−D7FF
  7. F900−FD3D, FD40−FDCF, FDF0−FE44, FE47−FFFD
  8. 10000−1FFFD, 20000−2FFFD, 30000−3FFFD, 40000−4FFFD, 50000−5FFFD, 60000−6FFFD, 70000−7FFFD, 80000−8FFFD, 90000−9FFFD, A0000−AFFFD, B0000−BFFFD, C0000−CFFFD, D0000−DFFFD, E0000−EFFFD

D.2 Ranges of characters disallowed initially

  1. 0300−036F, 1DC0−1DFF, 20D0−20FF, FE20−FE2F

Regards, Martin



More information about the Python-Dev mailing list