msg331861 - (view) |
Author: Jess Johnson (jess.j) |
Date: 2018-12-14 19:59 |
When given xml that that would raise a ParseError, but parsing is stopped before the ParseError is raised, xml.etree.ElementTree.iterparse leaks memory. Example: import gc from io import StringIO import xml.etree.ElementTree as etree import objgraph def parse_xml(): xml = """ """ parser = etree.iterparse(StringIO(initial_value=xml)) for _, elem in parser: if elem.tag == 'LEVEL1': break def run(): parse_xml() gc.collect() uncollected_elems = objgraph.by_type('Element') print(uncollected_elems) objgraph.show_backrefs(uncollected_elems, max_depth=15) if __name__ == "__main__": run() Output: [<Element 'LEVEL1' at 0x10df712c8>] Also see this gist which has an image showing the objects that are retained in memory: https://gist.github.com/grokcode/f89d5c5f1831c6bc373be6494f843de3 |
|
|
msg331863 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-12-14 20:11 |
I wrote attached run.py which confirms a leak using tracemalloc: $ python3 run.py 1 calls: 15.3B / call (total: 15.3 kB) 100 calls: 15.3B / call (total: 1527.7 kB) 1000 calls: 15.3B / call (total: 15265.0 kB) |
|
|
msg331864 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-12-14 20:11 |
Oops, there was a typo, you should read kB: 1 calls: 15.3 kB / call (total: 15.3 kB) 100 calls: 15.3 kB / call (total: 1527.7 kB) 1000 calls: 15.3 kB / call (total: 15265.0 kB) |
|
|
msg331875 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-12-14 22:06 |
The problem was with detecting a reference cycle containing a TreeBuilder. |
|
|
msg332051 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-12-18 11:20 |
Oops, my PR 11169 used the wrong issue number: bpo-35257 instead of bpo-35502. Anyway, I closed it, the change is too complex. -- IMHO the root issue is the handling of the SyntaxError exception in XMLPullParser.feed(). I wrote a fix, but I don't have the bandwidth to write an unit test checking that the reference cycle is broken. commit 9f3354d36a89d7898bdb631e5119cc37e9a74840 (fix_etree_leak) Author: Victor Stinner <vstinner@redhat.com> Date: Fri Dec 14 22:03:16 2018 +0100 bpo-35257: Fix memory leak in XMLPullParser.feed() Fix memory leak in XMLPullParser.feed() of xml.etree: on syntax error, clear the traceback to break a reference cycle. diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index c1cf483cf5..f17c52541b 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1266,6 +1266,8 @@ class XMLPullParser: try: self._parser.feed(data) except SyntaxError as exc: + # bpo-35502: Break reference cycle + #exc.__traceback__ = None self._events_queue.append(exc) def _close_and_return_root(self): I don't see any behavior difference in XMLPullParser.read_events() which raise again the exception: events = self._events_queue while events: event = events.popleft() if isinstance(event, Exception): raise event else: yield event -- PR 11170 is also a nice enhancement (fix treebuilder_gc_traverse()), but maybe we should also prevent creating reference cycles in the first place? |
|
|
msg332052 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-12-18 11:53 |
It is not easy to avoid reference cycles if use a generator function. And generator function is much faster than an implementation as a class with the __next__ method. We need to access the iterator object from the code of the generator function, and this creates a cycle. |
|
|
msg332078 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-12-18 20:29 |
New changeset d2a75c67830d7c9f59e4e9b60f36974234c829ef by Serhiy Storchaka in branch 'master': bpo-35502: Fix reference leaks in ElementTree.TreeBuilder. (GH-11170) https://github.com/python/cpython/commit/d2a75c67830d7c9f59e4e9b60f36974234c829ef |
|
|
msg332082 - (view) |
Author: miss-islington (miss-islington) |
Date: 2018-12-18 21:40 |
New changeset 60c919b58bd3cf8730947a00ddc6a527d6922ff1 by Miss Islington (bot) in branch '3.7': bpo-35502: Fix reference leaks in ElementTree.TreeBuilder. (GH-11170) https://github.com/python/cpython/commit/60c919b58bd3cf8730947a00ddc6a527d6922ff1 |
|
|
msg340991 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2019-04-27 15:39 |
This ticket looks like it's done for 3.7/8. Can it be closed? I guess 3.6 isn't relevant anymore, right? |
|
|
msg341005 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-04-27 18:00 |
The 3.6 branch no longer accept bugfixes. |
|
|