[Python-Dev] Python in Unicode context (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Tue Aug 3 19:24:11 CEST 2004


François Pinard wrote:

One thing is that a Python module should have some way to know the encoding used in its source file, maybe some kind of module._coding_'_ _next to module.file', saving the coding effectively used while compilation was going on.

That would be possible to implement. Feel free to create a patch.

I wonder if some other cookie, next to the coding:'_ _cookie, could not be used to declare that all strings in this module_ _only should be interpreted as Unicode by default, but without the need_ _of resorting to u' prefix all over.

This could be a starting point of another syntax debate. For example,

from future import string_literals_are_unicode

would be possible to implement. If PEP 244 would have been adapted, I would have proposed

directive unicode_strings

Other syntax forms would also be possible. Again, if you know a syntax which you like, propose a patch. Be prepared to also write a PEP defending that syntax.

P.S. - Should I say and confess, one thing I do not like much about Unicode is how proponents often perceive it, like a religion, and all the fanatism going with it. Unicode should be seen and implemented as a choice, more than a life commitment :-). Right now, my feeling is that Python asks a bit too much of a programmer, in terms of commitment, if we only consider the editing work required on sources to use it, or not.

Not sure what you are referring here to. You do have the choice of source encodings, and, in fact, "Unicode" is not a valid source encoding. "UTF-8" is, and from a Python point of view, there is absolutely no difference between that and, say, "ISO-8859-15" - you can choose whatever source encoding you like, and Python does not favour any of them (strictly speaking, it favour ASCII, then ISO-8859-1, then the rest).

Choice of source encoding is different from the choice of string literals. You can use Unicode strings, or byte strings, or mix them. It really is your choice.

Regards, Martin



More information about the Python-Dev mailing list