[Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever? (original) (raw)

Tim Delaney timothy.c.delaney at gmail.com
Tue Mar 19 09:10:32 EDT 2019


On Tue, 19 Mar 2019 at 23:13, David Mertz <mertz at gnosis.cx> wrote:

In a way, this case makes bugs worse because they are not only a Python internal matter. XML is used to communicate among many tools and programming languages, and relying on assumptions those other tools will not follow us a bad habit.

I have a recent example I encountered where the 3.7 behaviour (sorting attributes) results in a third-party tool behaving incorrectly, whereas maintaining attribute order works correctly. The particular case was using HTML tags for importing into Calibre for converting to an ebook. The most common symptom was that series indexes were sometimes being correctly imported, and sometimes not. Occasionally other tags would also fail to be correctly imported.

Turns out that gave consistently correct results, whilst was erratic. And whilst I'd specified the tags with the name attribute first, I was then passing the HTML through BeautifulSoup, which sorted the attributes.

Now Calibre is definitely in the wrong here - it should be able to import regardless of the order of attributes. But the fact is that there are a lot of tools out there that are semi-broken in a similar manner.

This to me is an argument to default to maintaining order, but provide a way for the caller to control the order of attributes when formatting e.g. pass an ordering function. If you want sorted attributes, pass the built-in sorted function as your ordering function. But I think that's getting beyond the scope of this discussion.

Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20190320/1a496c7e/attachment.html>



More information about the Python-Dev mailing list