[Python-Dev] Allowing non-ASCII identifiers (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Mon Feb 9 17:03:11 EST 2004


François Pinard wrote:

1. At run-time, identifiers are represented as Unicode objects unless they are pure ASCII. IOW, they are converted from the source encoding to Unicode objects in the process of parsing.

This is already the case, isn't it?

Currently, all identifiers are byte strings, at run-time, representing ASCII characters. IOW, you currently won't observe Unicode strings as identifiers.

2. As a consequence of 1), all places there identifiers appear need to support Unicode objects (e.g. dict, getattr, etc)

I do not much know the internals, yet I suspect one more thing to consider is whether Unicode strings looking like non-ASCII identifiers should be interned or not, the same as currently done for ASCII.

Indeed; I had not thought about this.

# -- coding: Latin-1 -- élève = 3 print élève [...] So, the Python compiler is sensitive to the active locale.

Yes, that's a bug. It will use byte strings as identifiers (without running your example, I'd expect that dir() shows they are UTF-8)

This is kind of an happy bug! May we count on it being supported in the interim? :-) :-)

I would think so: this bug has been present for quite some time, and nobody complained :-)

Martin



More information about the Python-Dev mailing list