Issue 3075: make minidom.toxml() encoding argument useful (original) (raw)
Right now, the encoding argument added to xml.dom.minidom.DOMObject.toxml() in Python 2.3 seems fairly useless. It has to be UTF-8. But a one-line change to the implementation of toprettyxml would make it useful; instead of the encoding error method being "strict", make it "xmlcharrefreplace". So change
writer = codecs.lookup(encoding)[3](writer)
to
writer = codecs.lookup(encoding)[3](writer, "xmlcharrefreplace")
This is how toprettyxml looks in 3.1/2 which seems to meet the OP's need, I'll close in a few days time unless someone objects.
def toprettyxml(self, indent="\t", newl="\n", encoding=None): # indent = the indentation string to prepend, per level # newl = the newline string to append use_encoding = "utf-8" if encoding is None else encoding writer = codecs.getwriter(use_encoding)(io.BytesIO()) if self.nodeType == Node.DOCUMENT_NODE: # Can pass encoding only to document, to put it into XML header self.writexml(writer, "", indent, newl, encoding) else: self.writexml(writer, "", indent, newl) if encoding is None: return writer.stream.getvalue().decode(use_encoding) else: return writer.stream.getvalue()