[Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) (original) (raw)
Victor Stinner victor.stinner at gmail.com
Tue Jan 28 11:22:40 CET 2014
- Previous message: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol)
- Next message: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
2014-01-28 "Martin v. Löwis" <martin at v.loewis.de>:
Debugging reveals that it is actually the many integer objects which trigger the sharing code. So a much simplified example of Victor's benchmarking code can use
data = [0]*10000000 The difference between version 2 and version 3 here is that v2 marshals a lot of "0" integers, whereas version 3 marshals a single one, and then a lot of references to this integer.
Since the output size looks to be the same, it may be interesting to special-case small integers, or even integers and floats in general. Handling references to these numbers takes probably more CPU, whereas the gain on the file size is probably minor.
I wrote a short patch: http://bugs.python.org/issue20416
"dumps v3 is 60% faster, loads v3 is also 14% faster."
"dumps v4 is 66% faster, loads v4 is 16% faster."
"file size (on version 3 and 4) is unchanged with my patch."
"So with the patch, the Python 3.4 default version (4) is faster (dump 20% faster, load 16% faster) and produces smaller files (10% smaller)."
It looks like a win-win patch :-)
The drawback is that files storing many duplicated huge numbers will not be smaller with marshal version >= 3.
Victor
- Previous message: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol)
- Next message: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]