[Python-Dev] PEP 263 considered faulty (for some Japanese) (original) (raw)

Martin v. Loewis martin@v.loewis.de
13 Mar 2002 08:54:18 +0100


Tom Emerson <tree@basistech.com> writes:

The UTF-8 BOM is an aBOMination that should not be allowed to live. The only editor that I know of that inserts the sequence is Microsoft's WordPad (or TextPad, I don't use either). I hope XEmacs isn't going to do this.

I used to think the same way, but now I have changed sides. I still agree that the notion of UCS byte orders is an abomination, and even that using UCS in on-disk files is a stupid thing to do.

Reliable detection of encodings is a good thing, though, as the Web has demonstrated. Encoding declarations are good (this is the idea behind PEP 263). Just consider the UTF-8 BOM not as a byte-order mark (what byte order, anyway), but as an encoding declaration, or signature. With that view, I can happily accept it as useful, and I wish more editors would atleast comprehend it (in the sense of displaying it with zero width), and perhaps even generate it.

Regards, Martin