bpo-14465: xml.etree.ElementTree pretty printing by dzeban · Pull Request #4016 · python/cpython (original) (raw)

As for serialization performance - here are some simple tests. I've tested with current master and my branch with and without pretty printing. I did tostring serialization on 207KB xml file.

$ git checkout master
$ ./configure --enable-optimizations
$ make
$ ./python 
Python 3.7.0a2+ (heads/master:73c4708, Oct 22 2017, 00:03:16) 
[GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux

$ ./python -m timeit -s "import xml.etree.ElementTree as ET; tree = ET.parse('xml.xml'); root = tree.getroot()" "ET.tostring(root, 'unicode', 'xml')"
20 loops, best of 5: 12.4 msec per loop

$ git checkout etree-pretty-print 
$ ./configure --enable-optimizations
$ make

$ ./python 
Python 3.7.0a1+ (heads/etree-pretty-print:f114e68, Oct 21 2017, 23:35:43) 
[GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux

$ ./python -m timeit -s "import xml.etree.ElementTree as ET; tree = ET.parse('xml.xml'); root = tree.getroot()" "ET.tostring(root, 'unicode', 'xml')"                                                                                                                           
20 loops, best of 5: 14.2 msec per loop

$ ./python -m timeit -s "import xml.etree.ElementTree as ET; tree = ET.parse('xml.xml'); root = tree.getroot()" "ET.tostring(root, 'unicode', 'xml', pretty_print=True)"
20 loops, best of 5: 16.7 msec per loop

So the new version is 14% slower. Pretty printing is 17% slower than non pretty printing. Overall, pretty printing enabled code is 34% slower than original version from master.

I'm not sure if it's really scary. Do we have such comparison for lxml?