[Python-Dev] XML codec? (original) (raw)
Walter Dörwald walter at livinglogic.de
Fri Nov 9 11:51:38 CET 2007
- Previous message: [Python-Dev] XML codec?
- Next message: [Python-Dev] XML codec?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Martin v. Löwis wrote:
ci = codecs.lookup("xml-auto-detect") p = expat.ParserCreate() e = "utf-32" s = (u"" % e).encode(e) s = ci.encode(ci.decode(s)[0], encoding="utf-8")[0] p.Parse(s, True) So how come the document being parsed is recognized as UTF-8?
Because you can force the encoder to use a specified encoding. If you do this and the unicode string starts with an XML declaration, the encoder will put the specified encoding into the declaration:
import codecs
e = codecs.getencoder("xml-auto-detect") print e(u"", encoding="utf-8")[0]
This prints:
OK, so should I put the C code into a xml module? I don't see the need for C code at all.
Doing the bit fiddling for Modules/_codecsmodule.c::detect_xml_encoding_str() in C felt like the right thing to do.
Servus, Walter
- Previous message: [Python-Dev] XML codec?
- Next message: [Python-Dev] XML codec?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]