Issue 14007: xml.etree.ElementTree - XMLParser and TreeBuilder's doctype() method missing (original) (raw)

Issue14007

Created on 2012-02-14 04:16 by eli.bendersky, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (25)
msg153328 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-02-14 04:19
The C accelerator of xml.etree.ElementTree (used by default since issue 13988) does not use or define or use the doctype() methods of the XMLParser and TreeBuilder classes, although this method is documented. As far as I can tell, this problem exists in 3.2 as well and wasn't introduced by the changes in issue 13988.
msg153330 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-02-14 04:51
The problem is deeper. _elementtree does not expose XMLParser and TreeBuilder as types at all, just as factory functions. XMLParser: not sure if it was meant to be subclassed. If not, it should at least be documented. In any case, the XMLParser in _elementtree does not use the doctype() method on its target. TreeBuilder: was definitely meant to be subclassed (it's documented explicitly), so it must be exposed as a type, not as a factory function.
msg153332 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-02-14 07:33
For the doctype() issue, it might be removed, since it was already deprecated in 3.2 (in compliance with PEP 387). However the deprecation cycle is under discussion for the 3.x serie. (issue 13248) For subclassing, you hit the same issue for all the functions exposed by _elementtree, including Element fatory.
msg153934 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-02-22 08:28
It doesn't really matter if something was *meant* to be subclassed. If it can be in 3.x, and can't be in 3.x+1, that's a sort of backwards compatibility bug we want to avoid pretty strongly because it's gratuitous breakage.
msg154329 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-02-26 11:55
the class versus factory issue is gone to issue 14128. The current issue is only about the doctype() method missing in the C implementation. I propose to drop this deprecated method from the Python version, instead of implementing the deprecated method in the C version.
msg154333 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-02-26 12:02
Florent, The deprecation should be probably raised separately on pydev. From the recent discussions on this and similar topics, I doubt that removal of these methods will be accepted.
msg154338 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-02-26 12:29
I understand the point about compatibility. However it is slightly different here, because the method is already deprecated in Python 2.7 and 3.2, with a warning in the documentation and a DeprecationWarning at runtime. This method was never available in the C version, and the documentation is clear about the recommended way of writing a custom doctype handler. Arguably, it is not the most popular feature of ElementTree. I am not opposed to adding the deprecated method in the C implementation, but it will need someone to do the patch, taking care of raising the deprecation warning correctly, and taking care of the case where XMLParser is subclassed. Is it worth the hassle? Please not that contrary to what is stated in the first message (), the doctype() method is not defined on the default TreeBuilder (Python) class. The documentation suggests to add it on custom TreeBuilder implementations.
msg154343 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-02-26 13:13
This last feature (doctype handler on custom TreeBuilder) does not have tests... So, it is certainly broken with the C implementation.
msg154398 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-02-26 21:49
Two other differences: * the C TreeBuilder has an undocumented and unused method "xml" * if you omit one of the TreeBuilder method (start(), end(), data(), close()) on you custom TreeBuilder implementation, the C XMLParser works fine, but the Python XMLParser raises an attribute error. And I confirm that if you implement the "doctype()" method on a custom TreeBuilder object, the C XMLParser ignores it, while the Python version works fine. I propose: - to drop the undocumented, empty "TreeBuilder().xml" method. - to relax the Python XMLParser, in order to accept incomplete TreeBuilder implementation (like the C version) - to implement the "doctype()" handler in the C XMLParser (this needs some work)
msg154408 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-02-26 22:18
On Sun, Feb 26, 2012 at 23:49, Florent Xicluna <report@bugs.python.org>wrote: Yes, these suggestions sound reasonable to me. Moving toward two more consistent implementations of the API, while not disabling existing features is the way to go. Things like relaxing the Python XMLParser just add features rather than restricting them, so they should not result in breakage due to backward-incompatibility. Eli
msg154935 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-03-05 09:43
New changeset 39cc025968f1 by Florent Xicluna in branch 'default': Issue #14007: drop unused TreeBuilder().xml. http://hg.python.org/cpython/rev/39cc025968f1 New changeset 47016103185f by Florent Xicluna in branch 'default': Issue #14007: accept incomplete TreeBuilder objects (missing start/end/data/close) for the Python implementation as well. Add disabled tests for the doctype() method. http://hg.python.org/cpython/rev/47016103185f
msg161873 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-05-29 12:46
New changeset 717632ae7b3f by Eli Bendersky in branch 'default': Issue #14007: make TreeBuilder an actual type exposed from _elementtree, and subclassable. http://hg.python.org/cpython/rev/717632ae7b3f
msg161875 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-05-29 13:07
I'm working on completing this issue. It will be done in stages, with each stage committed as a separate chunk of functionality. * Make TreeBuilder an actual exported type (subclassable) from _elementtree * TreeBuilder constructor should accept arguments as documented * Make XMLParser an actual exported type (subclassable) from _elementtree * Make XMLParser call doctype on a given target builder, and on itself, as is done in the Python version The first bullet was implemented and committed --^
msg161961 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-05-30 14:59
New changeset 20b8f0ee3d64 by Eli Bendersky in branch 'default': Issue #14007: implemented the 'element_factory' feature of TreeBuilder in http://hg.python.org/cpython/rev/20b8f0ee3d64
msg161987 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-05-31 09:38
* Another problem: the C implementation of XMLParser does not take 3 positional args, but only 2. Although the 'html' arg is documented as "unsupported", it should still be taken and silently ignored, similarly to the Python version
msg162048 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-06-01 04:16
New changeset a29ae1c2b8b2 by Eli Bendersky in branch 'default': Issue #14007: make XMLParser a real subclassable type exported from _elementtree. +cleanups http://hg.python.org/cpython/rev/a29ae1c2b8b2
msg162058 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-06-01 08:34
New changeset 6f9bfcc1896f by Eli Bendersky in branch 'default': Issue #14007: implement doctype() method calling in XMLParser of _elementtree. http://hg.python.org/cpython/rev/6f9bfcc1896f
msg162059 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-06-01 08:36
The latest commit completes this issue.
msg162179 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-06-02 21:31
This is causing buildbot failures on some of the buildbots: http://www.python.org/dev/buildbot/all/builders/x86%20Gentoo%203.x/builds/2529/steps/test/logs/stdio http://www.python.org/dev/buildbot/all/builders/x86%20Windows7%203.x/builds/5082/steps/test/logs/stdio
msg162189 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-06-03 03:11
Thanks. Fixed in changeset eb1d633fe307. I'll watch the bots to see no problems remain.
msg171604 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-09-30 02:53
It seems like your changes have introduced a segfault: #16089
msg201519 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2013-10-28 09:18
There is an issue in Python 2.7 (and 3.2 if that matters) with the DeprecationWarning for the doctype() method being triggered internally. It is a bit different from this issue but I think a solution has already been committed for 3.3, and the commit message references this issue. Let me know if I should raise a separate report. The warning is triggered for the Python version of the ElementTree module: $ python2.7 -Wall Python 2.7.5 (default, Sep 6 2013, 09:55:21) [GCC 4.8.1 20130725 (prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from xml.etree import ElementTree >>> ElementTree.XML(b'') /usr/lib/python2.7/xml/etree/ElementTree.py:1627: DeprecationWarning: This method of XMLParser is deprecated. Define doctype() method on the TreeBuilder target. DeprecationWarning, /usr/lib/python2.7/xml/etree/ElementTree.py:1627: DeprecationWarning: This method of XMLParser is deprecated. Define doctype() method on the TreeBuilder target. DeprecationWarning, <Element 'elem' at 0xd47910> >>> Possible solution is the patch hunk below, taken from this commit: http://hg.python.org/cpython/diff/47016103185f/Lib/xml/etree/ElementTree.py#l1.99 @@ -1636,7 +1627,7 @@ class XMLParser: pubid = pubid[1:-1] if hasattr(self.target, "doctype"): self.target.doctype(name, pubid, system[1:-1]) - elif self.doctype is not self._XMLParser__doctype: + elif self.doctype != self._XMLParser__doctype: # warn about deprecated call self._XMLParser__doctype(name, pubid, system[1:-1]) self.doctype(name, pubid, system[1:-1])
msg201533 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-10-28 12:53
Martin, do you see a way to work around the problem? I'm not sure it's serious enough to warrant committing to the 2.7.x branch at this point.
msg201538 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2013-10-28 13:00
The best way to work around it for me is just to ignore the warning. It doesn’t really worry me that much, I only noticed it while porting a program to Python 3 anyway. So if you don’t want to touch the 2.7 branch I can live with that :)
msg201539 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-10-28 13:06
On Mon, Oct 28, 2013 at 6:00 AM, Martin Panter <report@bugs.python.org>wrote: > > Martin Panter added the comment: > > The best way to work around it for me is just to ignore the warning. It > doesn’t really worry me that much, I only noticed it while porting a > program to Python 3 anyway. So if you don’t want to touch the 2.7 branch I > can live with that :) > I'd rather not. 2.7.x is in maintenance mode - we'll fix serious bugs and may change trivial things. But in this case, since the effects are hardly serious we should consider the danger of mucking with code in a very stable maintenance branch. While this issue may seem simple at the surface, it's not. That's because ET in 3.3+ is quite different from ET in 3.2- (including 2.7-). In 3.3 ET underwent major changes, and its testing infrastructure was greatly improved too. Back-porting across the 3.3 boundary is risky because it's hard to foresee the effects (i.e. interactions between the C accelerator and the Python module).
History
Date User Action Args
2022-04-11 14:57:26 admin set github: 58215
2013-10-28 13:06:10 eli.bendersky set messages: +
2013-10-28 13:00:00 martin.panter set messages: +
2013-10-28 12:53:47 eli.bendersky set messages: +
2013-10-28 09🔞43 martin.panter set nosy: + martin.pantermessages: +
2012-09-30 02:53:28 christian.heimes set nosy: + christian.heimesmessages: +
2012-06-03 03:12:00 eli.bendersky set status: open -> closedmessages: +
2012-06-02 21:31:51 r.david.murray set status: closed -> opennosy: + r.david.murraymessages: +
2012-06-01 08:36:56 eli.bendersky set status: open -> closedresolution: fixedmessages: + stage: test needed -> resolved
2012-06-01 08:34:31 python-dev set messages: +
2012-06-01 04:16:50 python-dev set messages: +
2012-05-31 09:38:09 eli.bendersky set messages: +
2012-05-30 14:59:11 python-dev set messages: +
2012-05-29 13:07:57 eli.bendersky set messages: +
2012-05-29 12:46:53 python-dev set messages: +
2012-04-27 21:59:38 Arfrever set priority: normal -> release blocker
2012-03-05 09:43:03 python-dev set nosy: + python-devmessages: +
2012-02-26 22🔞46 eli.bendersky set messages: +
2012-02-26 21:49:37 flox set messages: +
2012-02-26 13:13:33 flox set messages: + stage: needs patch -> test needed
2012-02-26 12:29:31 flox set priority: high -> normalmessages: +
2012-02-26 12:02:39 eli.bendersky set messages: +
2012-02-26 11:55:24 flox set messages: +
2012-02-22 08:28:19 georg.brandl set nosy: + georg.brandlmessages: +
2012-02-14 10:49:41 Arfrever set nosy: + Arfrever
2012-02-14 07:33:49 flox set messages: +
2012-02-14 04:51:46 eli.bendersky set messages: +
2012-02-14 04:19:50 eli.bendersky set messages: +
2012-02-14 04:19:29 eli.bendersky set -> (no value)messages: -
2012-02-14 04:16:26 eli.bendersky create