Issue 35502: Memory leak in xml.etree.ElementTree.iterparse (original) (raw)

Created on 2018-12-14 19:59 by jess.j, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
run2.py vstinner,2018-12-14 20:11
Pull Requests
URL Status Linked Edit
PR 11170 merged serhiy.storchaka,2018-12-14 21:59
PR 11169 closed vstinner,2018-12-14 22:10
PR 11217 merged miss-islington,2018-12-18 20:29
Messages (10)
msg331861 - (view) Author: Jess Johnson (jess.j) Date: 2018-12-14 19:59
When given xml that that would raise a ParseError, but parsing is stopped before the ParseError is raised, xml.etree.ElementTree.iterparse leaks memory. Example: import gc from io import StringIO import xml.etree.ElementTree as etree import objgraph def parse_xml(): xml = """ """ parser = etree.iterparse(StringIO(initial_value=xml)) for _, elem in parser: if elem.tag == 'LEVEL1': break def run(): parse_xml() gc.collect() uncollected_elems = objgraph.by_type('Element') print(uncollected_elems) objgraph.show_backrefs(uncollected_elems, max_depth=15) if __name__ == "__main__": run() Output: [<Element 'LEVEL1' at 0x10df712c8>] Also see this gist which has an image showing the objects that are retained in memory: https://gist.github.com/grokcode/f89d5c5f1831c6bc373be6494f843de3
msg331863 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-14 20:11
I wrote attached run.py which confirms a leak using tracemalloc: $ python3 run.py 1 calls: 15.3B / call (total: 15.3 kB) 100 calls: 15.3B / call (total: 1527.7 kB) 1000 calls: 15.3B / call (total: 15265.0 kB)
msg331864 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-14 20:11
Oops, there was a typo, you should read kB: 1 calls: 15.3 kB / call (total: 15.3 kB) 100 calls: 15.3 kB / call (total: 1527.7 kB) 1000 calls: 15.3 kB / call (total: 15265.0 kB)
msg331875 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-12-14 22:06
The problem was with detecting a reference cycle containing a TreeBuilder.
msg332051 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-18 11:20
Oops, my PR 11169 used the wrong issue number: bpo-35257 instead of bpo-35502. Anyway, I closed it, the change is too complex. -- IMHO the root issue is the handling of the SyntaxError exception in XMLPullParser.feed(). I wrote a fix, but I don't have the bandwidth to write an unit test checking that the reference cycle is broken. commit 9f3354d36a89d7898bdb631e5119cc37e9a74840 (fix_etree_leak) Author: Victor Stinner <vstinner@redhat.com> Date: Fri Dec 14 22:03:16 2018 +0100 bpo-35257: Fix memory leak in XMLPullParser.feed() Fix memory leak in XMLPullParser.feed() of xml.etree: on syntax error, clear the traceback to break a reference cycle. diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index c1cf483cf5..f17c52541b 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1266,6 +1266,8 @@ class XMLPullParser: try: self._parser.feed(data) except SyntaxError as exc: + # bpo-35502: Break reference cycle + #exc.__traceback__ = None self._events_queue.append(exc) def _close_and_return_root(self): I don't see any behavior difference in XMLPullParser.read_events() which raise again the exception: events = self._events_queue while events: event = events.popleft() if isinstance(event, Exception): raise event else: yield event -- PR 11170 is also a nice enhancement (fix treebuilder_gc_traverse()), but maybe we should also prevent creating reference cycles in the first place?
msg332052 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-12-18 11:53
It is not easy to avoid reference cycles if use a generator function. And generator function is much faster than an implementation as a class with the __next__ method. We need to access the iterator object from the code of the generator function, and this creates a cycle.
msg332078 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-12-18 20:29
New changeset d2a75c67830d7c9f59e4e9b60f36974234c829ef by Serhiy Storchaka in branch 'master': bpo-35502: Fix reference leaks in ElementTree.TreeBuilder. (GH-11170) https://github.com/python/cpython/commit/d2a75c67830d7c9f59e4e9b60f36974234c829ef
msg332082 - (view) Author: miss-islington (miss-islington) Date: 2018-12-18 21:40
New changeset 60c919b58bd3cf8730947a00ddc6a527d6922ff1 by Miss Islington (bot) in branch '3.7': bpo-35502: Fix reference leaks in ElementTree.TreeBuilder. (GH-11170) https://github.com/python/cpython/commit/60c919b58bd3cf8730947a00ddc6a527d6922ff1
msg340991 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-04-27 15:39
This ticket looks like it's done for 3.7/8. Can it be closed? I guess 3.6 isn't relevant anymore, right?
msg341005 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-27 18:00
The 3.6 branch no longer accept bugfixes.
History
Date User Action Args
2022-04-11 14:59:09 admin set github: 79683
2019-04-27 18:02:59 scoder set status: open -> closedstage: patch review -> resolvedresolution: fixedversions: - Python 3.6
2019-04-27 18:00:05 vstinner set messages: +
2019-04-27 15:39:05 scoder set messages: +
2018-12-18 21:40:27 miss-islington set nosy: + miss-islingtonmessages: +
2018-12-18 20:29:31 miss-islington set pull_requests: + <pull%5Frequest10453>
2018-12-18 20:29:16 serhiy.storchaka set messages: +
2018-12-18 11:53:29 serhiy.storchaka set messages: +
2018-12-18 11:20:42 vstinner set messages: +
2018-12-15 00:15:44 jess.j set versions: + Python 3.6
2018-12-14 22:10:01 vstinner set pull_requests: + <pull%5Frequest10408>
2018-12-14 22:06:17 serhiy.storchaka set nosy: + scoder, eli.benderskymessages: + versions: + Python 3.8
2018-12-14 21:59:01 serhiy.storchaka set keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest10407>
2018-12-14 21:28:39 serhiy.storchaka set assignee: serhiy.storchakanosy: + serhiy.storchaka
2018-12-14 20:11:56 vstinner set files: + run2.pymessages: +
2018-12-14 20:11:24 vstinner set files: - run.py
2018-12-14 20:11:04 vstinner set files: + run.pynosy: + vstinnermessages: +
2018-12-14 19:59:00 jess.j create