[Python-3000] pep 3131 again (original) (raw)

Jason Orendorff jason.orendorff at gmail.com
Thu May 17 19:55:57 CEST 2007


Martin, this message suggests an addition to PEP 3131.

On 5/16/07, tomer filiba <tomerfiliba at gmail.com> wrote:

=== RTL/LTR === the only practical way to use RTL languages in code is to have an RTL programming language, where "if" is spelled "אם", "for" as "עבור", "in" as "בתוך", and so on, and the entire program is RTL. having code like --

for קקי in פיפי(1,2,3) is only unreadable by all means (since the parenthesis are LTR, while the name is RTL, etc.)

In theory, the Right Thing to do for this is support Unicode bidi format control characters. Check this out:

for קקי in פיפי‎(1,2,3): blort(קקי)

I just added U+200E, "LEFT-TO-RIGHT MARK", after each misbehaving RTL identifier, as recommended here: http://unicode.org/reports/tr9/#Usage

Note: some mail/news agents strip out format characters. (‮.gnikrow era sretcarahc lortnoc idib ,siht daer nac uoy fI‬‎) (‮If you can read this, control characters were stripped/ignored.‬‎)

Now... it's clearly absurd to be pasting invisible magic characters into source code, but that part is automatable. Just hack your editor to add U+200E after each run of strong-RTL characters, except in strings and comments. The real problems are:

  1. Many editors don't have bidi support. This might improve with time. Or not.

  2. Python forbids these characters. Martin, JavaScript treats these specially, and I think Python probably should, too:

The ECMAScript 3 standard for JavaScript requires the tokenizer to throw away all Unicode format-control characters (general category Cf).

ECMAScript 4 will likely tweak this (an incompatible change) to retain those characters only in strings and regexps. I like that better.

Cheers, -j



More information about the Python-3000 mailing list