[Python-3000] Support for PEP 3131 (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Thu May 24 13:55:01 CEST 2007


Jim Jewett writes:

I would like an alert (and possibly an import exception) on any code whose executable portion is not entirely in ASCII.

Are you talking about language definition or implementation? I like the idea of such checks, as long as they are not mandatory in the language and can be turned off easily at run time in the default configuration. I'd also really like a generalization (described below).

The only issues PEP 3131 should be concerned with defining are those that cause problems with canonicalization, and the range of characters and languages allowed in the standard library.

Fair enough -- but the problem is that this isn't a solved issue yet;

IMHO the stdlib is a solved issue. The PEP says "in the standard library, we use ASCII only, except in tests and the like," and "we use English unless there is no reasonable equivalent in English." That's right.

AFAIK canonicalization is also a solved issue (although exactly what "NFC" means might change with Unicode errata and of course with future addition of combining characters or precombined characters).

The notion of "identifier constituent" is a bit thorny. While in general Cf characters don't belong in my understanding, there are some weird references to ZWJ and ZWNJ that I don't understand in UAX#31. I say "leave them out until somebody named 'Bhattacharya' says 'Hey! I need that!'" In general, when in doubt, leave it out.

And prohibit it. I think it's a very bad idea to give identifier authors any control over their presentation to readers. If an editor has a broken or nonexistent bidi implementation, for example, its user is probably used to that. With sufficient breakage in a presentation algorithm, I suppose that the same identifier could be presented differently in different contexts, and that different identifiers could be presented identically. But that's not Python's problem. This can easily happen in ASCII, too. (Consider an editor that truncates display silently at column 80.)

Even having read their reports, my initial rules would still have banned mixed-script, which would have prevented your edict- example.

Urk. I see your point (Ka-Ping's Cyrillic example makes it glaringly clear why that's the conservative way to go). I don't have to like it, but I could live with it. (Especially since "edict-" is a poor man's namespace. That device isn't needed in Python.)

I propose it would be useful to provide a standard mechanism for auditing the input stream. There would be one implementation for the stdlib .... A second .... A third, ....

This might deal with my concerns. It is a bit more complicated than the current plans.

Well, what I really want is a loadable table. My motivation is that I want organizations to be able to "enforce" a policy that is less restrictive than "ASCII-only" but more restrictive than "almost anything goes". My students don't need Sanskrit; Guido's tax accountant doesn't need kanji, and neither needs Arabic. I think that they should be able to get the same strict "alert or even import exception" (that you want on non-ASCII) for characters outside their larger, but still quite restricted, sets.



More information about the Python-3000 mailing list