[Python-3000] Support for PEP 3131 (original) (raw)

Jim Jewett jimjjewett at gmail.com
Wed May 23 18:26:55 CEST 2007

Previous message: [Python-3000] Support for PEP 3131
Next message: [Python-3000] Support for PEP 3131
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 5/23/07, Stephen J. Turnbull <stephen at xemacs.org> wrote:

Jim Jewett writes:

> It simplifies checking for identifiers that don't stick to ASCII,

Only if you assume that people will actually perceive the 10-character string "L\u00F6wis" as an identifier, regardless of the fact that any programmable editor can be trained to display the 5-character string "Löwis" in a very small amount of code. Conversely, any programmable editor can easily be trained to take the internal representation "Löwis" and display it as "L\u00F6wis", giving all the benefits of the representation you propose. But who would ever enable it?

I would.

I would like an alert (and possibly an import exception) on any code whose executable portion is not entirely in ASCII.

Comments aren't a problem, unless they somehow erase or hide other characters or line breaks. Strings aren't a problem unless I evaluate them. Code ... I want to know if there is some non-ASCII.

Even Latin-1 isn't much of a problem, except for single-quotes. I do want to know if 'abc' is a string or an identifier made with the "prime" letter.

This might be an innocent cut-and-paste error (and how else would most people enter non-native characters), but it is still a problem -- and python would often create a new variable instead of warning me.

The only issues PEP 3131 should be concerned with defining are those that cause problems with canonicalization, and the range of characters and languages allowed in the standard library.

Fair enough -- but the problem is that this isn't a solved issue yet; the unicode group themselves make several contradictory recommendations.

I can come up with rules that are probably just about right, but I will make mistakes (just as the unicode consortium itself did, which is why they have both ID and XID, and why both have stability characters). Even having read their reports, my initial rules would still have banned mixed-script, which would have prevented your edict- example.

So I'll agree that defining the charsets and combinations and canonicalization is the right scope; I just feel that best practice isn't yet clear enough.

I propose it would be useful to provide a standard mechanism for auditing the input stream. There would be one implementation for the stdlib that complains[1] about non-ASCII characters and possibly non-English words, and IMO that should be the default (for the reasons Ka-Ping gives for opposing the whole PEP). A second one should provide a very conservative Unicode set, with provision for amendment as experience shows restriction to be desirable or extension to be safe. A third, allowing any character that can be canonicalized into the form that PEP 3131 allows internally, is left as an exercise for the reader wild 'n' crazy enough to want to use it.

This might deal with my concerns. It is a bit more complicated than the current plans.

-jJ

Previous message: [Python-3000] Support for PEP 3131
Next message: [Python-3000] Support for PEP 3131
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-3000 mailing list