[Python-3000] Support for PEP 3131 (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Fri May 25 13:13:03 CEST 2007


Jim Jewett writes:

Definition; I don't care whether it is a different argument to import or a flag or an environment variable or a command-line option, or ... I just want the decision to accept non-ASCII characters to be explicit.

Ka-Ping's tricky.py shows that reliance on magic directives a la PEP 263 loses. I agree with Martin that in practice most such hacks will get caught in the ordinary process of editing, applying patches, sending email, and the like, but if the compiler is going to do the checking on behalf of the user, it should not rely on anything the files say.

Ideally, it would even be explicit per extra character allowed, though there should obviously be shortcuts to accept entire scripts.

How about a regexp character class as starting point?

So how about

(1) By default, python allows only ASCII.

+1

But neither Martin nor Guido likes it, so I'm continuing to think about it. Martin's objection that people will try it and assume that it's unimplemented smells like FUD to me, though.

(2) Additional characters are permitted if they appear in a table named on the command line.

+1

These additional characters should be restricted to code points larger than ASCII (so you can't easily turn "!" into an ID char)

+1

You can specify any character you want, but if it's ASCII, or not in the classes PEP 3131 ends up using to define the maximal set, it gets deleted from the extension table (ASCII has its own table, conceptually). This permits whole scripts, blocks, or ranges to be included.

Optionally warn on such deletions at load of the table (that would be better a separate tool), but preferably when parsing the identifier throw a SyntaxError

"""This character is in the table of extension characters for
identifiers, but is of class Cf, which is forbidden in identifiers."""

If you want to include punctuation or

-1

Why waste the effort of the Unicode technical committees?

undefined characters, so be it.

-1

Assuming undefined == reserved for future standardization that violates the Unicode standard.

-1 on private space characters

You could argue that a private space character could be valid within a module, or an application of cooperating modules, but I don't think it's worth trying to deal with it. "I'm from Kansas, show me" (a use case).



More information about the Python-3000 mailing list