[Python-Dev] Bytes path support (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Tue Aug 26 13:14:23 CEST 2014
- Previous message: [Python-Dev] Bytes path support
- Next message: [Python-Dev] Bytes path support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Am 24.08.14 03:11, schrieb Greg Ewing:
Isaac Morland wrote:
In HTML 5 it allows non-ASCII-compatible encodings as long as U+FEFF (byte order mark) is used:
http://www.w3.org/TR/html-markup/syntax.html#encoding-declaration Not sure about XML. According to Appendix F here: http://www.w3.org/TR/xml/#sec-guessing an XML parser needs to be prepared to try all the encodings it supports until it finds one that works well enough to decode the XML declaration, then it can find out the exact encoding used.
That's not what this section says. Instead, it says that you need to auto-detect UCS-4, UTF-16, UTF-8 from the BOM, or guess them or EBCDIC from the encoding of '<?'. This should be enough to actually parse the encoding declaration. Other non-ASCII-compatible encodings can only be used if declared in an upper-level protocol (such as HTTP).
The parser is not expected to try out all encodings it supports.
Regards, Martin
- Previous message: [Python-Dev] Bytes path support
- Next message: [Python-Dev] Bytes path support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]