[Python-3000] Support for PEP 3131 (original) (raw)
Josiah Carlson jcarlson at uci.edu
Fri May 25 06:36:12 CEST 2007
- Previous message: [Python-3000] Support for PEP 3131
- Next message: [Python-3000] Support for PEP 3131
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Guido van Rossum" <guido at python.org> wrote:
On 5/24/07, Ka-Ping Yee <python at zesty.ca> wrote: > To pit this as "ascii lovers vs. non-ascii lovers" is a pretty large > oversimplification. You could name them "people who want to be able > to know what the code says" and "people who don't mind not being able > to know what the code says". Or you could name them "people who want > Python's lexical syntax to be something they fully understand" and > "people who don't mind the extra complexity". Or "people who don't > want Python's lexical syntax to be tied to a changing external > standard" and "people who don't mind the extra variability." > > However you characterize them, keep in mind that those in the former > group are asking for default behaviour that 100% of Python users > already use and understand. There's no cost to keeping identifiers > ASCII-only because that's what Python already does. > > I think that's a pretty strong reason for making the new, more complex > behaviour optional.
If there's a security argument to be made for restricting the alphabet used by code contributions (even by co-workers at the same company), I don't see why ASCII-only projects should have it easier than projects in other cultures.
For the sake of argument, pretend that we went with a command line option to enable certain character sets. In my opinion, there should be a default character set that is allowed. The only character set that makes sense as a default, ignoring previously-existing environment variables (which don't necessarily help us), is ascii.
Why? Primarily because ascii identifiers are what are allowed today, and have been allowed for 15 years. But there is this secondary data point that Stephen Turnbull brought up; 95% of users (of Emacs) never touch non-ascii code. Poor extrapolation of statistics aside, to make the default be something that does not help 95% of users seems a bit... overenthusiastic. Where else in Python have we made the default behavior only desired or useful to 5% of our users?
With that said, and with what Stephen and others have said about unicode in Java, I don't believe there will be terribly significant cross polination of non-ascii identifier source. Of the source that does become popular and has non-ascii identifiers, I don't believe that it would take much time before there are normalized versions of the source, either published by the original authors or created by users. (having a tool to do unicode -> ascii transliteration of identifiers would make this a non-issue)
Though others don't like it, I think that having a command line option to enable other character sets is a reasonable burdon to place on the 5% of users that will experience non-ascii identifiers. For those who work with it on a regular basis, having an environment variable should be sufficient (with command line arguments to add additional allowable character sets). For those who wish to import code at runtime and/or have arbitrary identifiers, having an interface for adding or removing allowable character sets for code imported during runtime should work reasonably well (both for people who want to allow arbitrary identifiers, and those who want to restrict identifiers after the runtime system is up).
In terms of speed issues that Guillaume has brought up, this is a non-issue. The time to verify identifiers as a pyc is loaded, when every identifier in a pyc file is interned on loading, is insignificant; especially when in Python one can do...
for identifier in identifiers:
for character in identifier:
if character not in allowable_characters:
raise ImportError("...")
And considering we can do millions of dictionary/set lookups each second on a modern machine, I can't imagine that identifier verification time will be a significant burden.
- Josiah
- Previous message: [Python-3000] Support for PEP 3131
- Next message: [Python-3000] Support for PEP 3131
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]