[Python-Dev] Fixing the XML batteries (original) (raw)

Eli Bendersky eliben at gmail.com
Mon Feb 6 14:01:57 CET 2012


What should change?

a) The stdlib documentation should help users to choose the right tool right from the start. Instead of using the totally misleading wording that it uses now, it should be honest about the performance characteristics of MiniDOM and should actively suggest that those who don't know what to choose (or even that they can choose) should not use MiniDOM in the first place. I created a ticket (issue11379) for a minor step in this direction, but given the responses, I'm rather convinced that there's a lot more that can be done and should be done, and that it should be done now, right for the next release.

On one hand I agree that ET should be emphasized since it's the better API with a much faster implementation. But I also understand Martin's point of view that minidom has its place, so IMHO some sort of compromise should be reached. Perhaps we can recommend using ET for those not specifically interested in the DOM interface, but for those who are, minidom is still a good stdlib option (?).

Tying this doc clarification with an optimization in minidom is not something that makes sense. This is just delaying a much needed change forever.

b) cElementTree should finally loose it's "special" status as a separate library and disappear as an accelerator module behind ElementTree. This has been suggested a couple of times already, and AFAIR, there was some opposition because 1) ET was maintained outside of the stdlib and 2) the APIs of both were not identical. However, getting ET 1.3 into Py2.7 and 3.2 was a U-turn. Today, ET is only being maintained in the stdlib by Florent Xicluna (who is doing a good job with it), and ET 1.3 has basically made the APIs of both implementations compatible again. So, 3.3 would be the right milestone for fixing the "two libs for one" quirk.

This, at least in my view, is the more important point which unfortunately got much less attention in the thread. I was a bit shocked to see that in 3.3 trunk we still have both the Python and C versions exposed and only formally document ElementTree (the Python version), The only reference to cElementTree is an un-emphasized note:

A C implementation of this API is available as xml.etree.cElementTree.

Is there anything that really blocks providing cElementTree on "import ElementTree" and removing the explicit cElementTree for 3.3 (or at least leaving it with a deprecation warning)?

Eli



More information about the Python-Dev mailing list