[Python-3000] Support for PEP 3131 (original) (raw)

Jim Jewett jimjjewett at gmail.com
Fri May 25 01:12:27 CEST 2007


On 5/24/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:

Ka-Ping Yee schrieb: > On Thu, 24 May 2007, Jim Jewett wrote: >> So how about

>> (1) By default, python allows only ASCII. >> (2) Additional characters are permitted if they appear in a table >> named on the command line.

>> These additional characters should be restricted to code >> points larger than ASCII (so you can't easily turn "!" into >> an ID char), but beyond that, anything goes. If you want to >> include punctuation or undefined characters, so be it.

> +1! This is a fine solution. It is better than the "python -U" > option I proposed

-2. Any solution found must also accommodate users which are unaware of the security issue, and just want to use their native language for identifiers. So requiring them to change their environment or pass additional command line parameters is unacceptable.

There is no hope of explaining security; therefore, the defaults should be relatively safe. If the default is "anything goes", that isn't safe. If the default is "ASCII", that is safe, but possibly inconvenient. It depends on how hard it is to make the switch.

Is your concern just that it should be possible to do once (perhaps at install), rather than on each run? That would probably be OK too, so long as the default install was ASCII-only, so that someone had to make a decision about what to allow.

I assume that large communities will standardize on a tailored table, but a first-pass slightly-too-inclusive table is easy enough to create.

Here are the Thaana lines from (unicode consortium file) Scripts.txt

0780..07A5 ; Thaana # Lo [38] THAANA LETTER HAA..THAANA LETTER WAAVU 07A6..07B0 ; Thaana # Mn [11] THAANA ABAFILI..THAANA SUKUN 07B1 ; Thaana # Lo THAANA LETTER NAA

Though if it were me, I would probably simplify that to

0780..07B1 ; Thaana

Similarly, Devanagari has 15 lines in Scripts.txt, but you could simplify it to

0901..0939 ; Devanagari 093C..094D ; Devanagari 0950..0954 ; Devanagari 0958..0963 ; Devanagari 0966..096F ; Devanagari 097B..097F ; Devanagari

or even 0901..097F ; Devanagari and some undefined characters

if you (as a Devanagari speaker) were confident that none of your characters would be confused with ASCII. (In practice, you might well want to exclude the Devangari numbers for looking too similar to ASCII digits with different values, but ... that is a judgment call for Devanagari speakers to make, so long as they make it explicitly.)

-jJ



More information about the Python-3000 mailing list