Issue 14762: ElementTree memory leak (original) (raw)
Issue14762
Created on 2012-05-09 09:39 by Giuseppe.Attardi, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Messages (4) | ||
---|---|---|
msg160266 - (view) | Author: Giuseppe Attardi (Giuseppe.Attardi) | Date: 2012-05-09 09:39 |
I confirm the presence of a serious memory leak in ElementTree, using the iterparse() function. Memory grows disproportionately to dozens of GB when parsing a large XML file. For further information, see discussion in: http://www.gossamer-threads.com/lists/python/bugs/912164?do=post_view_threaded#912164 but notice that the comments attributing the problem to the OS are quite off the mark. To replicate the problem, try this on a Wikipedia dump: iterparse = ElementTree.iterparse(file) id = None for event, elem in iterparse: if elem.tag.endswith("title"): title = elem.text elif elem.tag.endswith("id") and not id: id = elem.text elif elem.tag.endswith("text"): print id, title, elem.text[:20] | ||
msg160275 - (view) | Author: Eli Bendersky (eli.bendersky) * ![]() |
Date: 2012-05-09 11:39 |
Can you specify how you import ET? I.e. from the pure Python or the C accelerator? Also, do you realize that the element iterparse returns should be discarded with 'clear'? [see tutorial here: http://eli.thegreenplace.net/2012/03/15/processing-xml-in-python-with-elementtree/] | ||
msg160286 - (view) | Author: Jesús Cea Avión (jcea) * ![]() |
Date: 2012-05-09 12:47 |
Can this be reproduced in 3.2/3.3? | ||
msg160288 - (view) | Author: Giuseppe Attardi (Giuseppe.Attardi) | Date: 2012-05-09 13:35 |
You are right, I should discard the elements. Thank you. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:30 | admin | set | github: 58967 |
2012-05-09 13:36:30 | Giuseppe.Attardi | set | status: open -> closedresolution: not a bug |
2012-05-09 13:35:29 | Giuseppe.Attardi | set | messages: + |
2012-05-09 12:47:18 | jcea | set | nosy: + jceamessages: + |
2012-05-09 11:39:01 | eli.bendersky | set | messages: + |
2012-05-09 11🔞42 | pitrou | set | nosy: + eli.bendersky, flox |
2012-05-09 09:39:44 | Giuseppe.Attardi | create |