Issue 14762: ElementTree memory leak (original) (raw)

Issue14762

Created on 2012-05-09 09:39 by Giuseppe.Attardi, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (4)
msg160266 - (view) Author: Giuseppe Attardi (Giuseppe.Attardi) Date: 2012-05-09 09:39
I confirm the presence of a serious memory leak in ElementTree, using the iterparse() function. Memory grows disproportionately to dozens of GB when parsing a large XML file. For further information, see discussion in: http://www.gossamer-threads.com/lists/python/bugs/912164?do=post_view_threaded#912164 but notice that the comments attributing the problem to the OS are quite off the mark. To replicate the problem, try this on a Wikipedia dump: iterparse = ElementTree.iterparse(file) id = None for event, elem in iterparse: if elem.tag.endswith("title"): title = elem.text elif elem.tag.endswith("id") and not id: id = elem.text elif elem.tag.endswith("text"): print id, title, elem.text[:20]
msg160275 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-05-09 11:39
Can you specify how you import ET? I.e. from the pure Python or the C accelerator? Also, do you realize that the element iterparse returns should be discarded with 'clear'? [see tutorial here: http://eli.thegreenplace.net/2012/03/15/processing-xml-in-python-with-elementtree/]
msg160286 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-05-09 12:47
Can this be reproduced in 3.2/3.3?
msg160288 - (view) Author: Giuseppe Attardi (Giuseppe.Attardi) Date: 2012-05-09 13:35
You are right, I should discard the elements. Thank you.
History
Date User Action Args
2022-04-11 14:57:30 admin set github: 58967
2012-05-09 13:36:30 Giuseppe.Attardi set status: open -> closedresolution: not a bug
2012-05-09 13:35:29 Giuseppe.Attardi set messages: +
2012-05-09 12:47:18 jcea set nosy: + jceamessages: +
2012-05-09 11:39:01 eli.bendersky set messages: +
2012-05-09 11🔞42 pitrou set nosy: + eli.bendersky, flox
2012-05-09 09:39:44 Giuseppe.Attardi create