[Python-Dev] PEP 460 reboot (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Tue Jan 14 10:54:58 CET 2014


Guido van Rossum writes:

Of course, nobody in their right mind would use a format string containing UTF-16 or EBCDIC.

How about Shift JIS and Big 5 (traditionally "mandated by Microsoft" in their respective regions, with Shift JIS still overwhelmingly popular) and GB* ("GB18030 is not just a good idea, It's The Law")? Are the Japanese and Chinese crazy by definition? This is where I get the willies -- not that you think anybody is crazy by definition, but because I personally have to live with people who use crazy encodings for interoperability reasons, in fact about half the text I process daily for work is in those encodings.

Anyway, the thought makes me shiver. GB2312 text may be encoded as EUC-CN, in which case it is ASCII-compatible, so no problem. I'm not sure if that's the encoding typically denoted by "GB2312" in email, though, and in any case it's irrelevant as most emails claiming "charset=GB2312" I receive nowadays include characters from the extension repertoires of GBK or GB18030. Shift JIS, Big 5, and GBK manage to avoid non-ASCII-compatible use of all characters significant in Python %-formatting (yay!), but .format is right out because {} are used. GB18030 in principle uses far more of the code space, including all of the syntactically significant punctuation, but in practice I don't know how many of those characters are actually assigned, let alone used.

And that is precisely my point. When you're using a format string, all of the format string (not just the part between { and }) had better use ASCII or an ASCII superset. And this (rightly) constrains the output to an ASCII superset as well.

Except that if you interpolate something like Shift JIS, much of the ASCII really isn't ASCII. That's a general issue, of course, if you do something that requires iterated format strings, but it's far more likely to appear to work most of the time with those encodings.

Of course you can say "if it hurts, don't do that", but ....



More information about the Python-Dev mailing list