Issue 17089: Expat parser parses strings only when XML encoding is UTF-8 (original) (raw)
xmlparser.Parse() works with string data only if XML encoding is utf-8 (or ascii). Examples:
import xml.parsers.expat parser = xml.parsers.expat.ParserCreate() content = [] parser.CharacterDataHandler = content.append parser.Parse("\xb5") 1 content ['µ'] parser = xml.parsers.expat.ParserCreate() content = [] parser.CharacterDataHandler = content.append parser.Parse("\xb5") 1 content ['µ'] parser = xml.parsers.expat.ParserCreate() content = [] parser.CharacterDataHandler = content.append parser.Parse("\xb5") Traceback (most recent call last): File "", line 1, in xml.parsers.expat.ExpatError: encoding specified in XML declaration is incorrect: line 1, column 30
This affects all other modules which works with XML: xml.sax, xml.dom.minidom, xml.dom.pulldom, xml.etree.ElementTree.
Here is a patch which fixes parsing string data with non-UTF-8 XML.