Issue 32196: Rewrite plistlib with functional style (original) (raw)

The proposed PR rewrites the plistlib module using a functional style. This speeds up loading and saving plist files at least by 10%. Saving plist files in XML format have sped up almost twice.

$ ./python -m timeit -s 'import plistlib; a = list(range(100))' -- 'plistlib.dumps(a, fmt=plistlib.FMT_XML)' Unpatched: 1000 loops, best of 5: 228 usec per loop Patched: 1000 loops, best of 5: 204 usec per loop

$ ./python -m timeit -s 'import plistlib; a = list(range(100))' -- 'plistlib.dumps(a, fmt=plistlib.FMT_BINARY)' Unpatched: 1000 loops, best of 5: 234 usec per loop Patched: 1000 loops, best of 5: 203 usec per loop

$ ./python -m timeit -s 'import plistlib; a = list(range(100)); p = plistlib.dumps(a, fmt=plistlib.FMT_XML)' -- 'plistlib.loads(p)' Unpatched: 1000 loops, best of 5: 308 usec per loop Patched: 2000 loops, best of 5: 155 usec per loop

$ ./python -m timeit -s 'import plistlib; a = list(range(100)); p = plistlib.dumps(a, fmt=plistlib.FMT_BINARY)' -- 'plistlib.loads(p)' Unpatched: 2000 loops, best of 5: 116 usec per loop Patched: 5000 loops, best of 5: 94.6 usec per loop

$ ./python -m timeit -s 'import plistlib; a = {"a%d" % i: i for i in range(100)}' -- 'plistlib.dumps(a, fmt=plistlib.FMT_XML)' Unpatched: 500 loops, best of 5: 433 usec per loop Patched: 1000 loops, best of 5: 384 usec per loop

$ ./python -m timeit -s 'import plistlib; a = {"a%d" % i: i for i in range(100)}' -- 'plistlib.dumps(a, fmt=plistlib.FMT_BINARY)' Unpatched: 500 loops, best of 5: 616 usec per loop Patched: 500 loops, best of 5: 560 usec per loop

$ ./python -m timeit -s 'import plistlib; a = {"a%d" % i: i for i in range(100)}; p = plistlib.dumps(a, fmt=plistlib.FMT_XML)' -- 'plistlib.loads(p)' Unpatched: 500 loops, best of 5: 578 usec per loop Patched: 1000 loops, best of 5: 308 usec per loop

$ ./python -m timeit -s 'import plistlib; a = {"a%d" % i: i for i in range(100)}; p = plistlib.dumps(a, fmt=plistlib.FMT_BINARY)' -- 'plistlib.loads(p)' Unpatched: 1000 loops, best of 5: 257 usec per loop Patched: 1000 loops, best of 5: 208 usec per loop

I don't have time to perform a review right now, I'm trying to get PEP 447 through review and that takes most of my available time at the moment.

I'm not convinced that the speedup of plistlib is relevant for real-world code, plist files are intended as simple configuration files and tend to contain little data and should be read/written only sporadically.

That said some people appear to abuse plistlib to process other files which are probably NSKeyedArchiver archives, and those can be a lot larger. But I'm opposed to explicitly supporting that use case, because the format of NSKeyedArchiver files is completely undocumented.

I have made this PR because functional style looks to me more for this kind of tasks. For every serialization or deseralization we have a distinct set of functions with common state. The state can be passes between functions as attributes of a one-time object or as non-local variables. The latter looks syntactically cleaner to me and, as a side effect, is faster.