ET reads a default-namespaced (xmnls="whatever") file correctly but won't write it back out. The error given is: ValueError: cannot use non-qualified names with default_namespace option The XML reference is reasonably clear on this: http://www.w3.org/TR/REC-xml-names/#defaulting "Default namespace declarations do not apply directly to attribute names;" "The namespace name for an unprefixed attribute name always has no value." Therefore, it is not an error to write non-qualified _attribute_ names with a default namespace; they're just considered un-namespaced anyway. The trivial case where a file is read in with a default namespace and written out with the same one should make it obvious: from xml.etree.ElementTree import * register_namespace('svg', 'http://www.w3.org/2000/svg') svg = ElementTree(XML(""" """)) svg.write('simple_new.svg',encoding='UTF-8',default_namespace='svg') Yet this will fail with the error above. By leaving off default_namespace, every element is pointlessly prefixed by 'svg:' in the resulting file, but it does work.
I have run into this a few times although it is only recently that I've convinced myself I understood the XML namespace spec well enough to know what the right behavior was. (I came to the same interpretation as silverbacknet.) I have attached a patch which I believe fixes (and tests) the problem.
FWIW: I noticed that my patch has a bug due to sharing the cache dict between element names and attribute names, although I think this is unlikely to crop up very often in practice. I'll submit a better patch if/when I get the time to put one together.
Here's an improved patch (and improved testcase). It's a little more intrusive than the last patch because when a default namespace is being used, two distinct qname caches must be made.
Yes, the problem occurs regardless of whether the default_namespace parameter is the correct SVG namespace URI --- it's the fact of requesting a default namespace at all that exposes the bug.
One workaround to this is described here : http://stackoverflow.com/a/4999510/168874 It involves prefixing all of the elements with the namespace like this : from xml.etree import ElementTree as ET # build a tree structure root = ET.Element("{http://www.company.com}STUFF") body = ET.SubElement(root, "{http://www.company.com}MORE_STUFF") body.text = "STUFF EVERYWHERE!" # wrap it in an ElementTree instance, and save as XML tree = ET.ElementTree(root) tree.write("page.xml", xml_declaration=True,encoding='utf-8', method="xml",default_namespace='http://www.company.com')
@gene_wood: that's unrelated. This ticket is about attributes being rejected incorrectly. Fixing the example of the OP: >>> from xml.etree.ElementTree import * >>> svg = ElementTree(XML(""" ... ... """)) >>> tostring(svg.getroot()) # formatting is mine b'<svg:svg xmlns:svg="http://www.w3.org/2000/svg" height="4cm" version="1.1" viewBox="0 0 1200 400" width="12cm">\n <svg:rect fill="none" height="398" stroke="blue" stroke-width="2" width="1198" x="1" y="1" />\n ' >>> svg.write('simple_new.svg',encoding='UTF-8',default_namespace='http://www.w3.org/2000/svg') Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 826, in write qnames, namespaces = _namespaces(self._root, default_namespace) File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 942, in _namespaces add_qname(key) File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 920, in add_qname "cannot use non-qualified names with " ValueError: cannot use non-qualified names with default_namespace option >>> svg.write('simple_new.svg',encoding='UTF-8') >>> So, it works without namespace defaulting and fails with an incorrect error when a default namespace is provided. Clearly a bug. Regarding the proposed patch: it looks like the right thing to do in general, but it has a relatively high code impact. I would prefer a patch with lower churn. One thing that could be tried is to use only one tag cache dict and extend the key from the plain tag to (tag, is_attribute). Might have a performance impact on the already slow serialiser, though. In any case, both approaches are quite wasteful, because they duplicate the entire namespace-prefix mapping just because there might be a single namespace that behaves differently for atributes. An alternative could be to split the *value* of the mapping in two: (element_prefix, attribute_prefix). This would keep the overhead at serialisation low, with only slightly more work when building the mapping. At first sight, I like that idea better. This code returns a list in one case and a set-like view in another (Py3): + if default_namespace: + prefixes_list = [ (default_namespace, "") ] + prefixes_list.extend(namespaces.items()) + else: + prefixes_list = namespaces.items() I can't see the need for this change. Why can't the default namespace be stored in the namespaces dict right from the start, as it was before? As a minor nitpick, this lambda based sort key: key=lambda x: x[1]): # sort on prefix is better expressed using operator.itemgetter(1). I'd also rename the "defaultable" flag to "is_attribute" and pass it as keyword argument (bare boolean parameters are unreadable in function calls). Given the impact of this change, I'd also suggest not applying it to Py2.x anymore.
I was working on what I thought would be an elegant solution to this problem: for non-qualified attributes, add the element's namespace before accessing the cache and strip the namespace prefix after accessing the cache if it's equal to the element's prefix. However, this approach doesn't work: even though non-qualified attributes will be processed like they are the element's namespace, they are considered to have no namespace. This means <ns:x a="1" ns:a="2"/> is considered valid XML, even though it effectively defines the same attribute twice. https://www.w3.org/TR/REC-xml-names/#uniqAttrs In my opinion the spec made a silly choice here, but that's probably not something that can fixed anymore. I haven't decided yet whether I'll make another attempt at fixing this issue. In any case, I hope this tale of caution benefits someone.
I think I have a good solution now, see the pull request for details. It does touch a lot of code, but I split all the changes into small consistent units, so it should be easier to verify whether they are correct.
The obvious work-around is to not use a default namespace. The result is just a visual difference, not a semantic one. If someone wants to continue with the existing PR, I'll try to free some time to review any improvements.