[Python-3000] Support for PEP 3131 (original) (raw)

Adam Olsen rhamph at gmail.com
Fri May 25 20:16:46 CEST 2007


On 5/25/07, Jim Jewett <jimjjewett at gmail.com> wrote:

On 5/25/07, Adam Olsen <rhamph at gmail.com> wrote: > If we allowed an underscore as a mixed-script separator > (allowing "def get原料(self):"), does this let us get away > with otherwise banning mixed-scripts?

I wondered that, until seeing that it wouldn't really solve the problem anyhow. It is possible to write entire words (such as "allow" or "scope") in multiple scripts. (Unicode calls these "whole script confusables".) You can't stop that without banning one of the scripts entirely, which would disenfranche users of some languages. So I think the least-bad solution is to say "OK, we won't allow these potentially confusable characters unless you were expecting them." And once we have a way to say "I'm expecting Cyrillic", we might as well let the user specify exactly what they're expecting, and make their own decisions on what it likely to be needed vs likely to be confused.

Indeed, the whole-script confusables does create significant holes, but I think the best solution is still to ban mixed-scripts and accept that it's only a "75% solution". Using an "I'm expecting cyrillic" flag makes it harder for those who need cyrillic AND still leaves them vulnerable to the same problem we're trying to protect ourselves from.

A more extreme solution would be to introduce a symbol type that converts that converts whole-script confusables to a canonical form (as well as mixed-script confusables, if we don't ban them). For practically it would have to coerce any unicode it was compared with for equality.. and probably not support sorting.

-- Adam Olsen, aka Rhamphoryncus



More information about the Python-3000 mailing list