Issue 1434090: DOM tree inconsistency in expat XML parser (original) (raw)
I wrote this to the xml-sig mailing list, but I didn't receive any feedback. Maybe I'm heard here: The expat XML parser module builds inconsistent DOM trees if an XML file contains elements (e.g. a comment) before the DOCTYPE declaration. The DocumentType node is properly appended to the root node's children, but neither its previousSibling pointer nor the previous node's nextSibling pointer are set. E.g., when parsing an XML file that looks like this:
Hello world
into a dom tree, the dom.childNodes vector will contain three nodes (comment, document type and an element), but the dom.firstChild.nextSibling points to 'None'.
The fix for this bug is trivial: The start_doctype_decl_handler should not directly manipulate the document's childNodes vector and use the _append_child function instead. A patch against Python 2.4.2 is appended.