[Python-Dev] XML codec? (original) (raw)

Walter Dörwald walter at livinglogic.de
Fri Nov 9 11:51:38 CET 2007


Martin v. Löwis wrote:

ci = codecs.lookup("xml-auto-detect") p = expat.ParserCreate() e = "utf-32" s = (u"" % e).encode(e) s = ci.encode(ci.decode(s)[0], encoding="utf-8")[0] p.Parse(s, True) So how come the document being parsed is recognized as UTF-8?

Because you can force the encoder to use a specified encoding. If you do this and the unicode string starts with an XML declaration, the encoder will put the specified encoding into the declaration:

import codecs

e = codecs.getencoder("xml-auto-detect") print e(u"", encoding="utf-8")[0]

This prints:

OK, so should I put the C code into a xml module? I don't see the need for C code at all.

Doing the bit fiddling for Modules/_codecsmodule.c::detect_xml_encoding_str() in C felt like the right thing to do.

Servus, Walter



More information about the Python-Dev mailing list