[Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever? (original) (raw)

Victor Stinner vstinner at redhat.com
Wed Mar 20 20:22:56 EDT 2019


Hi,

Le lun. 18 mars 2019 à 23:41, Raymond Hettinger <raymond.hettinger at gmail.com> a écrit :

We're having a super interesting discussion on https://bugs.python.org/issue34160 . It is now marked as a release blocker and warrants a broader discussion.

Thanks for starting a thread on python-dev. I'm the one who raised the priority to release blocker to trigger such discussion on python-dev.

Our problem is that at least two distinct and important users have written tests that depend on exact byte-by-byte comparisons of the final serialization.

Sorry but I don't think that it's a good summary of the issue. IMHO the issue is more general about how we introduce backward incompatible in Python.

The migration from Python 2 to Python 3 took around ten years. That's way too long and it caused a lot of troubles in the Python community. IMHO one explanation is our patronizing behavior regarding to users that I would like to summarize as "your code is wrong, you have to fix it" (whereas the code was working well for 10 years with Python 2!).

I'm not opposed to backward incompatible changes, but I think that we must very carefully prepare the migration and do our best to help users to migrate their code.

2). Go into every XML module and add attribute sorting options to each function that generate xml. (...)

Written like that, it sounds painful and a huge project... But in practice, the implementation looks simple and straightforward: https://github.com/python/cpython/pull/12354/files

I don't understand why such simple solution has been rejected.

IMHO adding an optional sort parameter is just the bare minimum that we can do for our users.

Alternatives have been proposed like a recipe to sort node attributes before serialization, but honestly, it's way too complex. I don't want to have to copy such recipe to every project. Add a new function, import it, use it where XML is written into a file, etc. Taken alone, maybe it's acceptable. But please remember that some companies are still porting their large Python 2 code base to Python 3. This new backward incompatible gets on top of the pile of other backward incompatible changes between 2.7 and 3.8.

I would prefer to be able to "just add" sort=True. Don't forget that tests like "if sys.version >= (3, 8):" will be needed which makes the overall fix more complicated.

Said differently, the stdlib should help the user to update Python. The pain should not only be on the user side.

Victor

Night gathers, and now my watch begins. It shall not end until my death.



More information about the Python-Dev mailing list