Issue 17088: ElementTree incorrectly refuses to write attributes without namespaces when default_namespace is used (original) (raw)

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Garrett Birkel, Rafael Ascensao, SpecLad, carlsampurna, dtluttik, eli.bendersky, gene_wood, marczellm, martin.panter, mthuurne, rhettinger, scoder, silverbacknet, wiml
Priority:	normal	Keywords:	patch

Created on 2013-01-31 02:35 by silverbacknet, last changed 2022-04-11 14:57 by admin.

Files
File name	Uploaded	Description	Edit
bug17088_1.patch	wiml,2013-12-05 08:29	Patch and test case	review
bug17088_2.patch	wiml,2013-12-13 23:14	Improved patch and test case	review

Pull Requests
URL	Status	Linked	Edit
PR 11050	open	mthuurne,2018-12-09 13:34

Messages (17)
msg181005 - (view)	Author: Silverback Networks (silverbacknet)	Date: 2013-01-31 02:35
ET reads a default-namespaced (xmnls="whatever") file correctly but won't write it back out. The error given is: ValueError: cannot use non-qualified names with default_namespace option The XML reference is reasonably clear on this: http://www.w3.org/TR/REC-xml-names/#defaulting "Default namespace declarations do not apply directly to attribute names;" "The namespace name for an unprefixed attribute name always has no value." Therefore, it is not an error to write non-qualified _attribute_ names with a default namespace; they're just considered un-namespaced anyway. The trivial case where a file is read in with a default namespace and written out with the same one should make it obvious: from xml.etree.ElementTree import * register_namespace('svg', 'http://www.w3.org/2000/svg') svg = ElementTree(XML(""" """)) svg.write('simple_new.svg',encoding='UTF-8',default_namespace='svg') Yet this will fail with the error above. By leaving off default_namespace, every element is pointlessly prefixed by 'svg:' in the resulting file, but it does work.
msg205281 - (view)	Author: Wim (wiml)	Date: 2013-12-05 08:29
I have run into this a few times although it is only recently that I've convinced myself I understood the XML namespace spec well enough to know what the right behavior was. (I came to the same interpretation as silverbacknet.) I have attached a patch which I believe fixes (and tests) the problem.
msg205951 - (view)	Author: Wim (wiml)	Date: 2013-12-12 10:45
FWIW: I noticed that my patch has a bug due to sharing the cache dict between element names and attribute names, although I think this is unlikely to crop up very often in practice. I'll submit a better patch if/when I get the time to put one together.
msg206156 - (view)	Author: Wim (wiml)	Date: 2013-12-13 23:14
Here's an improved patch (and improved testcase). It's a little more intrusive than the last patch because when a default namespace is being used, two distinct qname caches must be made.
msg206186 - (view)	Author: Stefan Behnel (scoder) *	Date: 2013-12-14 15:06
Note that the option is called "default_namespace", not "default_namespace_prefix". Could you try passing the namespace URI instead?
msg206217 - (view)	Author: Wim (wiml)	Date: 2013-12-15 07:14
Yes, the problem occurs regardless of whether the default_namespace parameter is the correct SVG namespace URI --- it's the fact of requesting a default namespace at all that exposes the bug.
msg209424 - (view)	Author: Wim (wiml)	Date: 2014-01-27 09:44
Ping
msg216061 - (view)	Author: Gene Wood (gene_wood)	Date: 2014-04-14 03:17
One workaround to this is described here : http://stackoverflow.com/a/4999510/168874 It involves prefixing all of the elements with the namespace like this : from xml.etree import ElementTree as ET # build a tree structure root = ET.Element("{http://www.company.com}STUFF") body = ET.SubElement(root, "{http://www.company.com}MORE_STUFF") body.text = "STUFF EVERYWHERE!" # wrap it in an ElementTree instance, and save as XML tree = ET.ElementTree(root) tree.write("page.xml", xml_declaration=True,encoding='utf-8', method="xml",default_namespace='http://www.company.com')
msg216067 - (view)	Author: Stefan Behnel (scoder) *	Date: 2014-04-14 06:00
@gene_wood: that's unrelated. This ticket is about attributes being rejected incorrectly. Fixing the example of the OP: >>> from xml.etree.ElementTree import * >>> svg = ElementTree(XML(""" ... ... """)) >>> tostring(svg.getroot()) # formatting is mine b'<svg:svg xmlns:svg="http://www.w3.org/2000/svg" height="4cm" version="1.1" viewBox="0 0 1200 400" width="12cm">\n <svg:rect fill="none" height="398" stroke="blue" stroke-width="2" width="1198" x="1" y="1" />\n ' >>> svg.write('simple_new.svg',encoding='UTF-8',default_namespace='http://www.w3.org/2000/svg') Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 826, in write qnames, namespaces = _namespaces(self._root, default_namespace) File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 942, in _namespaces add_qname(key) File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 920, in add_qname "cannot use non-qualified names with " ValueError: cannot use non-qualified names with default_namespace option >>> svg.write('simple_new.svg',encoding='UTF-8') >>> So, it works without namespace defaulting and fails with an incorrect error when a default namespace is provided. Clearly a bug. Regarding the proposed patch: it looks like the right thing to do in general, but it has a relatively high code impact. I would prefer a patch with lower churn. One thing that could be tried is to use only one tag cache dict and extend the key from the plain tag to (tag, is_attribute). Might have a performance impact on the already slow serialiser, though. In any case, both approaches are quite wasteful, because they duplicate the entire namespace-prefix mapping just because there might be a single namespace that behaves differently for atributes. An alternative could be to split the value of the mapping in two: (element_prefix, attribute_prefix). This would keep the overhead at serialisation low, with only slightly more work when building the mapping. At first sight, I like that idea better. This code returns a list in one case and a set-like view in another (Py3): + if default_namespace: + prefixes_list = [ (default_namespace, "") ] + prefixes_list.extend(namespaces.items()) + else: + prefixes_list = namespaces.items() I can't see the need for this change. Why can't the default namespace be stored in the namespaces dict right from the start, as it was before? As a minor nitpick, this lambda based sort key: key=lambda x: x[1]): # sort on prefix is better expressed using operator.itemgetter(1). I'd also rename the "defaultable" flag to "is_attribute" and pass it as keyword argument (bare boolean parameters are unreadable in function calls). Given the impact of this change, I'd also suggest not applying it to Py2.x anymore.
msg261785 - (view)	Author: Garrett Birkel (Garrett Birkel)	Date: 2016-03-14 22:24
Just hit up against this namespace bug. Has this patch been abandoned??
msg304120 - (view)	Author: Rafael Ascensao (Rafael Ascensao)	Date: 2017-10-11 09:02
what's the status on this?
msg331390 - (view)	Author: Maarten ter Huurne (mthuurne) *	Date: 2018-12-08 18:38
I was working on what I thought would be an elegant solution to this problem: for non-qualified attributes, add the element's namespace before accessing the cache and strip the namespace prefix after accessing the cache if it's equal to the element's prefix. However, this approach doesn't work: even though non-qualified attributes will be processed like they are the element's namespace, they are considered to have no namespace. This means <ns:x a="1" ns:a="2"/> is considered valid XML, even though it effectively defines the same attribute twice. https://www.w3.org/TR/REC-xml-names/#uniqAttrs In my opinion the spec made a silly choice here, but that's probably not something that can fixed anymore. I haven't decided yet whether I'll make another attempt at fixing this issue. In any case, I hope this tale of caution benefits someone.
msg331432 - (view)	Author: Maarten ter Huurne (mthuurne) *	Date: 2018-12-09 14:05
I think I have a good solution now, see the pull request for details. It does touch a lot of code, but I split all the changes into small consistent units, so it should be easier to verify whether they are correct.
msg354101 - (view)	Author: Maarten ter Huurne (mthuurne) *	Date: 2019-10-07 14:34
Can I please get a review of the pull request?
msg365760 - (view)	Author: Márton Marczell (marczellm)	Date: 2020-04-04 11:27
Can he please get a review of the pull request?
msg397793 - (view)	Author: Daan Luttik (dtluttik)	Date: 2021-07-19 12:47
Is there any workaround for this? this bug still seems to be present in python 3.9.6.
msg397794 - (view)	Author: Stefan Behnel (scoder) *	Date: 2021-07-19 12:58
The obvious work-around is to not use a default namespace. The result is just a visual difference, not a semantic one. If someone wants to continue with the existing PR, I'll try to free some time to review any improvements.

History
Date	User	Action	Args
2022-04-11 14:57:41	admin	set	github: 61290
2021-07-19 12:58:02	scoder	set	messages: + versions: + Python 3.10, Python 3.11, - Python 3.9
2021-07-19 12:47:05	dtluttik	set	nosy: + dtluttikmessages: + versions: + Python 3.9, - Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5
2021-06-20 14:27:21	carlsampurna	set	nosy: + carlsampurna
2020-04-04 11:27:26	marczellm	set	nosy: + marczellmmessages: +
2019-10-07 14:34:58	mthuurne	set	messages: +
2018-12-09 14:05:14	mthuurne	set	messages: +
2018-12-09 13:34:07	mthuurne	set	stage: patch reviewpull_requests: + <pull%5Frequest10286>
2018-12-08 20:25:13	rhettinger	set	nosy: + rhettinger
2018-12-08 18:38:17	mthuurne	set	nosy: + mthuurnemessages: +
2018-11-23 20:08:36	SpecLad	set	nosy: + SpecLad
2017-10-11 09:02:24	Rafael Ascensao	set	nosy: + Rafael Ascensaomessages: +
2016-03-14 22:24:40	Garrett Birkel	set	nosy: + Garrett Birkelmessages: +
2015-01-25 11:16:00	martin.panter	set	nosy: + martin.panter
2014-04-14 06:00:43	scoder	set	messages: +
2014-04-14 03:17:19	gene_wood	set	nosy: + gene_woodmessages: +
2014-01-27 09:44:09	wiml	set	messages: +
2013-12-15 07:14:11	wiml	set	messages: +
2013-12-14 15:06:23	scoder	set	messages: +
2013-12-14 14:51:37	scoder	set	nosy: + scoder, eli.bendersky
2013-12-13 23:14:49	wiml	set	files: + bug17088_2.patchmessages: +
2013-12-12 10:45:35	wiml	set	messages: +
2013-12-05 08:29:34	wiml	set	files: + bug17088_1.patchnosy: + wimlmessages: + keywords: + patch
2013-01-31 02:35:07	silverbacknet	create