[Python-Dev] Divorcing str and unicode (no more implicitconversions). (original) (raw)

Josiah Carlson jcarlson at uci.edu
Wed Oct 26 01:59:51 CEST 2005


"Martin v. Löwis" <martin at v.loewis.de> wrote:

Josiah Carlson wrote: > And how users could say, "name error? But I typed in window.draw(PEN) as > I was told to, and it didn't work!" Ah, so the "serious issues" you are talking about are not security issues, but usability issues.

Indeed, it was a misunderstanding, as the email stated: I did not mean to imply that I was concerned about the security implications of inserting arbitrary identifiers in Python (I was mentioning the web browser case for an example of how such characters have been confusing previously), I am concerned about confusion involved with using: [glyphs which are identical]

I don't think extending the range of acceptable characters will cause any additional confusion. Users are already getting "surprising" NameErrors/AttributeErrors in the following cases: - they just misspell the identifier, and then, when the error message is printed, fail to recognize the difference, as they read over the typo just like they read over it when mistyping it in the first place.

In this case it's not just a misreading, the characters look identical! When is an 'E' not an 'E'? When it is an Epsilon or Ie. Saying what characters will or will not be used as identifiers, when those characters are keys on a keyboard of a specific type, is pretty presumptuous.

- they run into confusions with different things having the same names in different contexts. For example, they wonder why they get TypeError for passing the wrong number of arguments to a function, when the call matches exactly what the source code in front of them tells them - only that they were calling a different function which just happened to have the same name.

Right, and users should be reading the documentation for the functions and methods they are calling.

In the light of these common mistakes, your example with an identifier named PEN, where the "P" might be a cyrillic letter or the E a greek one is just made up: For window.draw, people will readily understand that they are supposed to use Latin letters. More generally, they will know what script to use just from looking at the identifier.

Sure, that example was made up, but there are words which have been stolen from various languages by english, and you are discounting the case of single-letter temporary variables. Saying what will and won't happen over the course of using unicode identifiers is quite the prediction.

> Identically drawn glyphs are a problem, and pretending that they aren't > a problem, doesn't make it so. Right now, all possible name glyphs are > visually distinct

Not at all: Just compare Fool and Foo1 (and perhaps FooI) In the font in which I'm typing this, these are slightly different - but there are fonts in which the difference is really difficult to recognize.

Indeed, they are similar, but_ different_ in my font as well. The trick is that the glyphs are not different in the case of certain greek or cyrillic letters. They don't just /look/ similar they /are identical/.



More information about the Python-Dev mailing list