[Python-Dev] PEP 263 considered faulty (for some Japanese) (original) (raw)
Stephen J. Turnbull stephen@xemacs.org
13 Mar 2002 18:11:42 +0900
- Previous message: [Python-Dev] PEP 263 considered faulty (for some Japanese)
- Next message: [Python-Dev] PEP 263 considered faulty (for some Japanese)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Martin" == Martin v Loewis <martin@v.loewis.de> writes:
Martin> Reliable detection of encodings is a good thing, though,
I would think that UTF-8 can be quite reliably detected without the "BOM".
I suppose you could construct short ambiguous sequences easily for ISO-8859-[678] (which are meaningful in the corresponding natural language), but it seems that even a couple dozen characters would make the odds astronomical that "in the wild" syntactic UTF-8 is intended to be UTF-8 Unicode (assuming you're expecting a text file, such as Python source). Is that wrong? Have you any examples? I'd be interested to see them; we (XEmacs) have some ideas about "statistical" autodetection of encodings, and they'd be useful test cases.
Martin> as the Web has demonstrated.
But the Web in general provides (mandatory) protocols for identifying content-type, yet I regularly see HTML files with incorrect http-equiv meta elements, and XHTML with no encoding declaration containing Shift JIS. Microsoft software for Japanese apparently ignores Content-Type headers and the like in favor of autodetection (probably because the same MS software regularly relies on users to set things like charset parameters in MIME Content-Type).
I can't tell my boss that his mail is ill-formed (well, not to any effect). So I'd really love to be able to watch his face when Python 2.3 tells him his program is not legally encoded.
But I guess that's not convincing enough reason for Guido to mandate UTF-8.
-- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Don't ask how you can "do" free software business; ask what your business can "do for" free software.
- Previous message: [Python-Dev] PEP 263 considered faulty (for some Japanese)
- Next message: [Python-Dev] PEP 263 considered faulty (for some Japanese)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]