Issue 1433694: normalize function in minidom unlinks empty child nodes (original) (raw)

Created on 2006-02-17 16:52 by romankliotzkin, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
testcase.zip romankliotzkin,2006-02-17 16:52 Test case - script and XML
MINIDOM-PATCH maltehelmert,2008-02-01 22:35 patch for xml.dom.minidom.py
Messages (5)
msg27551 - (view) Author: RomanKliotzkin (romankliotzkin) Date: 2006-02-17 16:52
Hi, When calling the "normalize" method of DOM objects in the minidom library, the method will attempt to collate text nodes together. When it encounters an empty text node, it will do the following: (line 197 in minidom.py) else: # empty text node; discard child.unlink() unlink() sets all the members of the child node to None, BUT the parent node still has a link to the child node. Therefore, if one is later iterating through the parent node and calls removeChild on the unlinked node, a NotFoundErr exception will be thrown by the library. To trigger, use the test case script along with the test case XML document included in the zip file. testcase.py testcase.xml My suggestion is to call self.removeChild(child), then call child.unlink(). That way, there is no ambiguity about an unlinked child being referenced by the parent.
msg61980 - (view) Author: Malte Helmert (maltehelmert) Date: 2008-02-01 22:22
I can reproduce the bug on trunk (r60511). At first I thought the behaviour might be caused by the testcase removing items from the children list while iterating over it, but this is not the case; the exception is raised upon the first removal already. Here is a shorter testcase: ======================================== from xml.dom import minidom def testme(xmltext): node = minidom.parseString(xmltext).documentElement for child in node.childNodes: if child.nodeValue == "t": child.nodeValue = "" node.normalize() child = node.firstChild while child: next = child.nextSibling if child.nodeType == child.TEXT_NODE and child.nodeValue == "": node.removeChild(child) child = next return node print testme("tt").toxml() print testme("t").toxml() ======================================== The second call to testme fails with an xml.dom.NotFoundErr, but it should succeed and print "". While this appears to be a genuine bug, I don't agree with the proposed fix. I'm not sure if calling removeChild (which mutates self.childNodes) from within normalize (which iterates over self.childNodes at that time) is safe, and even if it is, it would make normalize O(N^2) (not taking into account recursion) where it should be O(N).
msg61981 - (view) Author: Malte Helmert (maltehelmert) Date: 2008-02-01 22:35
OK, I think I found the root cause. Node.normalize regenerates the list of children L with their previousSibling/nextSibling references from scratch; however, it fails to set the nextSibling reference for the very last element of L to None at the end. This is necessary iff the last node in the original child list is removed. The attached patch should solve the problem. In cases like this, should I also add a regression test?
msg61982 - (view) Author: Malte Helmert (maltehelmert) Date: 2008-02-01 22:46
Here is a minimal testcase to more clearly expose the root of the problem, in case a regression test is needed. Without the patch, the assertion fails. ========================================================== from xml.dom import minidom node = minidom.parseString("t").documentElement node.childNodes[1].nodeValue = "" node.normalize() assert node.childNodes[-1].nextSibling == None ==========================================================
msg62778 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2008-02-23 17:22
Patch applied to 2.6 in rev. 60995, and to 2.5-maint in rev. 60998. Thanks for reporting the problem, and for finding the bugfix!
History
Date User Action Args
2022-04-11 14:56:15 admin set github: 42916
2008-02-23 17:22:42 akuchling set status: open -> closedresolution: fixedmessages: + nosy: + akuchling
2008-02-01 22:46:13 maltehelmert set messages: +
2008-02-01 22:37:31 maltehelmert set versions: + Python 2.6, - Python 2.4
2008-02-01 22:35:13 maltehelmert set files: + MINIDOM-PATCHmessages: +
2008-02-01 22:22:16 maltehelmert set nosy: + maltehelmertmessages: +
2008-01-12 00:56:48 akuchling set keywords: + easy
2006-02-17 16:52:32 romankliotzkin create