[Python-3000] PEP 3131 accepted (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Fri May 25 12:45:39 CEST 2007


"Martin v. Löwis" writes:

If people can agree on a method for specifying, 'ascii only', 'ascii + character sets X, Y, Z', and it actually becomes an accepted part of the proposal, gets implemented, etc., I will grumble to myself at home, but I will stop trying to raise a stink here.

I think you can stop now - this is supported as a side effect of PEP 263, and implemented for years.

-1

That seems not to be the case. PEP 263 allows you to specify a coding system, not a character set. Whether that will restrict the character set depends on how the coding system is implemented. For example, ISO-2022-JP is implicitly a (near) UCS since it does not forbid designations, so you don't know (XEmacs implements it as a UCS, I'm not sure what GNU does), while ISO-2022-JP-2 is explicitly a UCS because it explicitly permits designations. And how about C1 code points in ISO 2022-conformant 8-bit coding systems (including all ISO 8859 systems)? Do they pass, or not? Any restriction is simply a side effect of the codec throwing an exception because it doesn't recognize the input. So this requires that users know how the relevant codec is implemented.

Second, this also removes your ability to use literal strings and comments outside that coding system. (Of course Unicode escapes will still be available, but hardly acceptable for string literals, and completely out of the question for comments.)

Third, it also has the defect of requiring you to use a legacy coding system, does it not? Ie, if I want to restrict to ASCII + Cyrillic, I can use ISO-8859-5 or KOI8-R but not UTF-8.

Finally it does not make it easy to create unions or subsets. One has to write a codec to do that.



More information about the Python-3000 mailing list