Issue 6472: Update ElementTree with upstream changes (original) (raw)

Created on 2009-07-13 00:55 by MLModel, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue6472_upstream_docs.diff flox,2009-12-14 08:44 Patch for documentation.
issue6472_upstream_py3k_v3.diff flox,2010-03-12 12:03 Patch, apply to 3.x
Messages (20)
msg90465 - (view) Author: Mitchell Model (MLModel) Date: 2009-07-13 00:55
I can't quite sort this out, because it's difficult to see what is intended. The documentation of xml.etree.ElementTree (19.11 in the Library doc) uses terms like "iterator", "tree iterator", "iterable", "list" in vague and perhaps not quite accurate ways. I can't tell from the documentation which functions/methods return lists, which return a generator, which return an unspecified kind of iterable, and so on. Moreover, the results are different using ElementTree than they are using cElementTree. In particular, getiterator() returns a list in ElementTree and a generator in cElementTree. This can make a substantial difference in performance when iterating over a large number of nodes (in addition to cElementTree's parsing being what appears to be about 10x faster). I think someone should go over the page and sort this out and make it clear what the user can expect. (I don't think it's fair to overgeneralize to things like "iterables" if the module is really meant to be making a commitment to a list or a generator.) I also think that the differences in the results of methods returned in the Python and C versions of the module should be highlighted. I stumbled on this trying to parses and extract individual bits of information out of large XML files. I full well realize there are better ways to do this (SAX, e.g.) and better ways to search than just iterate over all the tags of the type I'm interested in, but I should still know what to expect from ElementTree, especially because it is so wonderful!
msg95990 - (view) Author: Milko Krachounov (milko.krachounov) Date: 2009-12-05 13:19
This isn't just a documentation issue. A function named getiterator(), for which the docs say that it returns an iterator, should return an iterator, not just an iterable. They have different semantics and can't be used interchangeably, so the behaviour of getiterator() in ElementTree is wrong. I was using this in my program: iterator = element.getiterator() next(iterator) subelement = next(iterator) Which broke when I tried switching to ElementTree from cElementTree, even though the docs tell me that I'll get an iterator there. Also, for findall() and friends, is there any reason why we can't stick to either an iterator or list, and not both? The API will be more clear if findall() always returned a list, or always an iterator, regardless of the implementation. It is currently not clear what will happen if I do: for x in tree.findall(path): mutate_tree(tree, x)
msg96000 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-05 19:32
There's many differences between both implementations. I don't know if we can live with them or not. ~ $ ./python Python 3.1.1+ (release31-maint:76650, Dec 3 2009, 17:14:50) [GCC 4.3.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from xml.etree import ElementTree as ET, cElementTree as cET >>> from io import StringIO >>> SAMPLE = '' >>> IO_SAMPLE = StringIO(SAMPLE) With ElementTree >>> elt = ET.XML(SAMPLE) >>> elt.getiterator() [<Element root at 15cb920>] >>> elt.findall('') # or '.' [<Element root at 15cb920>] >>> elt.findall('./') [<Element root at 15cb920>] >>> elt.items() dict_items([]) >>> elt.keys() dict_keys([]) >>> elt[:] [] >>> IO_SAMPLE.seek(0) >>> next(ET.iterparse(IO_SAMPLE)) ('end', <Element root at 15d60d0>) >>> IO_SAMPLE.seek(0) >>> list(ET.iterparse(IO_SAMPLE)) [('end', <Element root at 15583e0>)] With cElementTree >>> elt_c = cET.XML(SAMPLE) >>> elt_c.getiterator() <generator object getiterator at 0x15baae0> >>> elt_c.findall('') [] >>> elt_c.findall('./') [<Element 'root' at 0x15cf3a0>] >>> elt_c.items() [] >>> elt_c.keys() [] >>> elt_c[:] Traceback (most recent call last): TypeError: sequence index must be integer, not 'slice' >>> IO_SAMPLE.seek(0) >>> next(cET.iterparse(IO_SAMPLE)) Traceback (most recent call last): TypeError: iterparse object is not an iterator >>> IO_SAMPLE.seek(0) >>> list(cET.iterparse(IO_SAMPLE)) [(b'end', <Element 'root' at 0x15cf940>)]
msg96023 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-06 11:51
Proposed patch fixes most of the discrepancies between both implementations. It restores some features that were lost with Python 3: * cElement slicing and extended slicing * iterparse, cET.getiterator and cET.findall return an iterator (as documented) Some tests were added to check these issues.
msg96040 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-06 21:16
I fixed it differently, using the upstream modules (Thank you Fredrik). * ElementTree 1.3a3-20070912 * cElementTree 1.0.6-20090110 It works. And it closes , too.
msg96048 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-12-07 11:10
The patch should have doc updates for new functionality, if any.
msg96049 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-07 12:38
I see some new features in the changelog. I will try to update the documentation during the week. (patch "py3k" fixed: support assignment of arbitrary sequences)
msg96181 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-09 21:30
Patch for the documentation. (source: upstream documentation)
msg96373 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-14 08:27
Small update of the patch for 3.2: the __cmp__method is replaced with __eq__ method (on CommentProxy and PIProxy).
msg97607 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-01-11 21:40
It would be nice to upgrade ElementTree for 2.7 and 3.2, at least.
msg99137 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-02-09 22:28
Patch updated, with upstream packages: * ElementTree 1.3a3-20070912 * cElementTree 1.0.6-20090110 Now all tests are identical for the ElementTree part: - ElementTree 2.x - cElementTree 2.x - ElementTree 3.x - cElementTree 3.x Waiting for some developer kind enough to review and merge in 2.7 and 3.2.
msg99138 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-09 23:22
Given the size of the patch, it's very difficult to review properly. In any case, could you upload it to http://codereview.appspot.com/ ?
msg99139 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-02-09 23:31
Ok, will do the upload to rietveld. In addition to the straight review of the patch itself, you could: - diff against the upstream source code (very few changes) - diff between 2.x and 3.x - review the test_suite (there's only additions, no real change) - hunt refleaks Btw, I've backported the last tests (#2746, #6233) to all 4 test files (ET and cET, 2.x and 3.x).
msg99140 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-02-09 23:51
Here it is: * http://codereview.appspot.com/207048/show
msg99449 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-02-16 23:21
Update the 2.x patch with the last version uploaded to rietveld (patch set 5). Improved test coverage with upstream tests and tests cases provided by Neil on issue #6232. Note: the patch for 3.x is obsolete.
msg99466 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-02-17 11:48
Strip out the experimental C API.
msg100856 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-11 14:40
Fixed on trunk with r78838. Some extra work is required to port it to 3.x. Thank you Fredrik and Antoine for reviewing this patch.
msg100881 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2010-03-11 19:02
W00t!
msg100928 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-12 12:03
Patch to merge ElementTree 1.3 in 3.x.
msg101037 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-14 01:45
Merged in 3.x with r78942 and r78945. See #8047 for a discussion about the `encoding` argument of the serializer (used for .write() method and tostring() tostringlist() functions). Currently the output is not encoded by default in 3.1 and 3.x. It is encoded to ASCII in 2.6 and 2.x.
History
Date User Action Args
2022-04-11 14:56:50 admin set github: 50721
2010-03-14 01:45:21 flox set status: open -> closedmessages: +
2010-03-12 12:04:06 flox set files: - issue6472_etree_upstream_v5a.diff
2010-03-12 12:03:59 flox set files: - issue6472_etree_upstream_py3k_v2.diff
2010-03-12 12:03:40 flox set files: + issue6472_upstream_py3k_v3.diffmessages: +
2010-03-11 19:02:19 effbot set messages: +
2010-03-11 15:57:40 flox link issue6266 superseder
2010-03-11 15:57:40 flox unlink issue6266 dependencies
2010-03-11 15:01:45 flox link issue6232 superseder
2010-03-11 15:01:45 flox unlink issue6232 dependencies
2010-03-11 15:00:15 flox link issue6265 superseder
2010-03-11 15:00:15 flox unlink issue6265 dependencies
2010-03-11 14:59:13 flox link issue6230 superseder
2010-03-11 14:59:13 flox unlink issue6230 dependencies
2010-03-11 14:57:27 flox link issue6565 superseder
2010-03-11 14:57:27 flox unlink issue6565 dependencies
2010-03-11 14:53:28 flox link issue3151 superseder
2010-03-11 14:53:28 flox unlink issue3151 dependencies
2010-03-11 14:51:35 flox unlink issue3475 dependencies
2010-03-11 14:51:35 flox link issue3475 superseder
2010-03-11 14:49:26 flox link issue1538691 superseder
2010-03-11 14:49:26 flox unlink issue1538691 dependencies
2010-03-11 14:40:17 flox set resolution: fixedmessages: + stage: patch review -> resolved
2010-02-23 15:48:04 flox link issue7990 dependencies
2010-02-17 11:49:01 flox set files: + issue6472_etree_upstream_v5a.diffmessages: +
2010-02-17 11:47:35 flox set files: - issue6472_etree_upstream_v5.diff
2010-02-16 23:21:46 flox set files: + issue6472_etree_upstream_v5.diffmessages: +
2010-02-16 23:19:42 flox set files: - issue6472_etree_upstream_v2.diff
2010-02-16 21:58:35 flox link issue6266 dependencies
2010-02-16 13:17:50 flox link issue6232 dependencies
2010-02-16 13:13:41 flox link issue6265 dependencies
2010-02-16 13:11:28 flox link issue6230 dependencies
2010-02-16 12:13:29 flox link issue6565 dependencies
2010-02-16 11:58:48 flox link issue3151 dependencies
2010-02-16 11:46:13 flox link issue1777 superseder
2010-02-16 11:43:34 flox link issue1767933 dependencies
2010-02-13 16:01:18 flox link issue1538691 dependencies
2010-02-13 15:57:38 flox link issue3475 dependencies
2010-02-10 12:16:40 pitrou set title: Inconsistent use of "iterator" in ElementTree doc & diff between Py and C modules -> Update ElementTree with upstream changes
2010-02-09 23:51:19 flox set messages: +
2010-02-09 23:31:53 flox set messages: +
2010-02-09 23:22:20 pitrou set messages: +
2010-02-09 22:29:17 flox set files: + issue6472_etree_upstream_py3k_v2.diff
2010-02-09 22:28:22 flox set files: + issue6472_etree_upstream_v2.diffmessages: +
2010-02-09 22:22:16 flox set files: - issue6472_upstream_py3k_v2.diff
2010-02-09 22:22:10 flox set files: - issue6472_upstream.diff
2010-01-11 21:40:18 flox set messages: + versions: - Python 2.6, Python 3.1
2009-12-14 08:44:14 flox set files: + issue6472_upstream_docs.diff
2009-12-14 08:43:14 flox set files: - issue6472_upstream_docs.diff
2009-12-14 08:28:11 flox set files: - issue6472_upstream_py3k.diff
2009-12-14 08:27:48 flox set files: + issue6472_upstream_py3k_v2.diffmessages: +
2009-12-09 21:31:02 flox set files: + issue6472_upstream_docs.diffmessages: +
2009-12-07 12:39:11 flox set files: - issue6472_upstream_py3k.diff
2009-12-07 12:38:59 flox set files: + issue6472_upstream_py3k.diffmessages: +
2009-12-07 11:11:13 pitrou link issue1143 superseder
2009-12-07 11:10:42 pitrou set priority: normalnosy: + pitroumessages: + stage: patch review
2009-12-07 08:22:19 flox set files: - issue6472.diff
2009-12-07 08:22:14 flox set files: - issue6472_py3k.diff
2009-12-07 08:21:36 flox set files: + issue6472_upstream_py3k.diffversions: - Python 3.0
2009-12-06 21:16:33 flox set files: + issue6472_upstream.diffmessages: +
2009-12-06 11:51:56 flox set files: + issue6472_py3k.diff
2009-12-06 11:51:21 flox set files: + issue6472.diffkeywords: + patchmessages: +
2009-12-05 19:32:49 flox set messages: +
2009-12-05 17:11:01 flox set nosy: + flox
2009-12-05 13:19:11 milko.krachounov set versions: + Python 2.6, Python 2.7nosy: + milko.krachounovmessages: + components: + Library (Lib)type: behavior
2009-07-13 02:24:16 benjamin.peterson set assignee: georg.brandl -> effbot
2009-07-13 01:32:33 jcsalterego set nosy: + effbot
2009-07-13 00:55:52 MLModel create