[Python-3000] Support for PEP 3131 (original) (raw)
Stephen J. Turnbull turnbull at sk.tsukuba.ac.jp
Tue May 29 05:57:23 CEST 2007
- Previous message: [Python-3000] Support for PEP 3131
- Next message: [Python-3000] Support for PEP 3131
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Greg Ewing writes:
Stephen J. Turnbull wrote:
If an English speaker would pronounce the spelling of an English word "A B C", and an Arabic speaker an Arabic word as "1 2 3", then as an identifier the combination English then Arabic is spelled "A B C _ 1 2 3".
But would an Arabic speaker pronounce the identifier as a whole as "A B C 1 2 3" or "1 2 3 A B C"? That's where I find it all gets very confusing.
Then "unask the question." Bidi is not context-free; that question is not properly formulated. Pragmatically, in a well-formed Python program in the overwhelming majority of cases you or she will be in a LTR context, so it will be read "A B C _ 1 2 3".
The ambiguity you have in mind probably is best expressed "what happens with a single line program?" Eg, one that appears on the display like this:
ABC_321
True, a native Arabic speaker would surely (absent any context except her early upbringing) read that "1 2 3 _ A B C". And I admit, I'd read it "A B C _ 1 2 3". That looks like an ambiguity requiring use of a direction indicator, but it's not. According to PEP 263 all Python programs implicitly start in ASCII (otherwise the optional coding cookie cannot be parsed, and presumably not the optional shebang, either).
So since the Python programmer (whether natively English-speaking or Arabic-speaking) starts in state "LTR", she reads the "A" first, not the "1", and there are no problems. Of course, you want to be able to express the identifier that would be spelled out (and represented in memory!) as "1 2 3 _ A B C", and you can:
321_ABC
Since I'm not an Arabic-speaker at all, I can only say I suspect that Arabic speakers will learn to do this context initialization very quickly, and to read comments marked at the end of the line, rather than the beginning. Ie, to an Arabic speaker an Arabic header comment will feel like this:
This is the Foomatic program. #
It makes passes at compilers. #
It is licentiously speaking a GPL program. #
A smart editor should be able to format that:
This is the Foomatic program. # It makes passes at compilers. # It is licentiously speaking a GPL program. #
It feels weird, but it's not that bad, to me anyway.
Once again, speakers of bidi languages are in a world of pain anyway; it's reasonable to suppose that this doesn't really make things worse. There are ambiguities here, and while a naive native speaker might resolve them differently in ad hoc cases from the above, I doubt they'd be lucky enough to come up with a consistent interpretation. On the other hand, humans are designed to learn the arbitrary rules of languages, as children, at least. Adults who are fortunate enough to retain enough of that ability to learn to program probably will have little trouble with this particular arbitrary rule. At least, that's my guess, indirectly supported by the Unicode rules for identifiers which suggests that it is reasonable to prohibit direction indicators in identifiers.
- Previous message: [Python-3000] Support for PEP 3131
- Next message: [Python-3000] Support for PEP 3131
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]