Issue 3873: Unpickling is really slow (original) (raw)
Created on 2008-09-15 16:37 by hagen, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (38)
Author: Hagen Fürstenau (hagen)
Date: 2008-09-15 16:37
Unpickling e.g. a large list seems to be really slow in Python 3.0.
The attached test script gives the following results for pickling and unpickling a list of 1M zeros, showing that although the C implementation seems to be used in Python 3.0, unpickling is even slower than with the "pickle" module of Python 2.6!
[hagenf@cluster-06 hagenf]$ python pickletst.py 2.71067500114 2.77484893799 [hagenf@cluster-06 hagenf]$ python3.0 pickletst.py 0.0925059318542 5.76121616364
Author: Antoine Pitrou (pitrou) *
Date: 2008-09-15 16:56
Do the numbers vary if you read the whole file at once and then unpickle the resulting bytes string? Large parts of the IO library are written in Python in 3.0, which might explain the discrepancy.
Author: Hagen Fürstenau (hagen)
Date: 2008-09-15 17:17
Yes, it gets much better, but even so (first reading file and timing only "loads") unpickling takes four times as long in Python 3.0 as with the old cPickle module:
[hagenf@cluster-06 hagenf]$ python pickletst2.py 0.0744678974152 0.0514161586761 [hagenf@cluster-06 hagenf]$ python3.0 pickletst3.py 0.0911619663239 0.208593845367
But I guess this can still be blamed on the BytesIO implementation...
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *
Date: 2008-09-15 17:24
Indeed. If I replace the file with f = io.BytesIO(open("tst", "rb").read()) timings are divided by 20...
After quick profiling, it seems that PyLong_New would benefit from a free list. len(bytearray) is called very often. To stay simple, it would be enough to only store longs of length 1 (<2**15).
Author: Antoine Pitrou (pitrou) *
Date: 2008-09-15 17:37
Gregory had patches for a freelist of long objects in #2013.
Author: Alexandre Vassalotti (alexandre.vassalotti) *
Date: 2008-10-07 21:12
The solution is to add a read buffer to Unpickler (Pickler already has a write buffer, so that why it is unaffected). I believe this would mitigate much of the (quite large) Python function call overhead.
cPickle has a performance hack to make it uses cStringIO and PyFile directly (via C function calls). In Python 3, the hack was removed since cStringIO.h and fileobject.h aren't public anymore. This explains the 0.02s deviation you are getting for dump speed.
Author: Hagen Fürstenau (hagen)
Date: 2008-12-05 09:21
I think a read buffer is not possible without being able to unread bytes from the stream. pickle shoudn't consume bytes after the end of a pickled object!
Author: Antoine Pitrou (pitrou) *
Date: 2009-01-18 12:13
Making this a duplicate of #4565 ("Rewrite the IO stack in C"). If anyone disagrees, please reopen!
Author: Hagen Fürstenau (hagen)
Date: 2009-01-18 13:40
With the io-c branch I see much better unpickling performance than before. But it still seems to be around 2 or 3 times slower than with cPickle in 2.6.
Is this expected at this point of io-c development? Otherwise perhaps this issue should stay open until it can be verified that nothing else can be done to get closer to the old cPickle performance.
Author: Antoine Pitrou (pitrou) *
Date: 2009-01-18 14:01
Hello,
With the io-c branch I see much better unpickling performance than before. But it still seems to be around 2 or 3 times slower than with cPickle in 2.6.
It's much closer here.
With 2.7 (trunk) and cPickle: 0.439934968948 0.669679880142 With the io-c branch: 0.136367082596 0.798221111298
Is this expected at this point of io-c development?
20-30% slower is expected for binary I/O (which is what I get above), not 2x or 3x slower. What is your system?
Author: Hagen Fürstenau (hagen)
Date: 2009-01-18 14:17
I uploaded a new pickletst.py which specifies protocol 2, otherwise we're comparing apples with oranges. With this I get:
for Python 2.6 and
on the io-c branch. Can you confirm this?
Author: Antoine Pitrou (pitrou) *
Date: 2009-01-18 14:25
Nice catch! I can confirm your figures with protocol=2 (and protocol=-1 as well).
Author: STINNER Victor (vstinner) *
Date: 2009-04-05 01:23
alexandre.vassalotti wrote:
The solution is to add a read buffer to Unpickler (...) would mitigate much of the (quite large) Python function call overhead. (...) cPickle has a performance hack to make it uses cStringIO and PyFile directly (via C function calls). In Python 3, the hack was removed (...)
Yes, unpickler_read() calls Buffered_read() through PyObject_Call+PyCFunction_Call which is expensive. And unpickle main loop starts with unpickler_read(self, &s, 1): just read one byte (the "opcode"). If Buffered_read() call is expensive, a solution is to avoid calling it (eg. read more bytes and... go backward at the end?).
Author: STINNER Victor (vstinner) *
Date: 2009-04-05 01:25
Unladen Swallow has a project to optimize pickle. Currently, it uses 3 benchmarks: pickle - use the cPickle module to pickle a variety of datasets. pickle_dict - microbenchmark; use the cPickle module to pickle a lot of dicts. pickle_list - microbenchmark; use the cPickle module to pickle a lot of lists.
Author: STINNER Victor (vstinner) *
Date: 2009-04-05 02:02
gprof (--enable-profiler) results:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
18.18 0.16 0.16 2011055 0.00 0.00 vgetargs1
7.95 0.23 0.07 2017033 0.00 0.00
PyObject_GenericGetAttr
6.82 0.29 0.06 2002006 0.00 0.00
_BufferedReader_read_unlocked
5.68 0.34 0.05 2012123 0.00 0.00 convertsimple
4.55 0.38 0.04 2013284 0.00 0.00 PyCFunction_Call
4.55 0.42 0.04 2004543 0.00 0.00 PyLong_AsSsize_t
3.41 0.45 0.03 2004565 0.00 0.00 PyNumber_Index
3.41 0.48 0.03 2002177 0.00 0.00 float_argument_error
3.41 0.51 0.03 1000497 0.00 0.00
PyLong_AsLongAndOverflow
2.27 0.53 0.02 2012123 0.00 0.00 convertitem
It looks like PyArg_ParseTuple() should be optimized ;-)
Author: STINNER Victor (vstinner) *
Date: 2009-04-06 11:52
Create a read buffer (4096 bytes) in unpickler class. Using [0]*10**7 or [1000]*10**7, load() is from 6 to 8 times faster.
I removed last_string attribute because it wasn't used.
If there are "tail" bytes, seek backward.
Author: STINNER Victor (vstinner) *
Date: 2009-04-06 11:53
I don't know why, but python-trunk is much slower than py3k (eg. with dump: 1000 ms vs 24 ms for py3k, or with load: 1500ms vs 186ms).
Author: STINNER Victor (vstinner) *
Date: 2009-04-06 11:56
My version of pickletest.py:
- make sure that file position is correct after the load()
- some benchmark. most interesting numbers:
without the patch :
version | data | dump ms | load ms | py3k | 0, 10^6 | 230 | 1500 | py3k | 1000, 10^6 | 255 | 1780 | py3k | 0, 10^7 | 2360 | 16044 | py3k | 1000, 10^7 | 2615 | 19380 |
with the patch:
version | data | dump ms | load ms | speed up: py3k | 0, 10^6 | 237 | 183 | x8 py3k | 1000, 10^6 | 241 | 248 | x7 py3k | 0, 10^7 | 2420 | 1860 | x8 py3k | 1000, 10^7 | 2850 | 3100 | x6
(data: 0, 10^6 means: [0]*10^6)
Author: STINNER Victor (vstinner) *
Date: 2009-04-06 12:02
Note about my patch: the buffer should be truncated after PyBytes_Concat(&self->buffer.pybytes, data) to avoid very long buffer. Something like: self->buffer.pybytes += data; self->buffer.pybytes = self->buffer.pybytes[index:]; self->buffer.index=0;
Author: Antoine Pitrou (pitrou) *
Date: 2009-04-06 12:09
I don't know why, but python-trunk is much slower than py3k (eg. with dump: 1000 ms vs 24 ms for py3k, or with load: 1500ms vs 186ms).
Perhaps you tried with the pure Python version (pickle) rather than the C one (cPickle)?
Author: Alexandre Vassalotti (alexandre.vassalotti) *
Date: 2009-04-08 17:28
Victor, Unpickler shouldn't raise an error if the given file object does support seek(); it should gracefully fall back to using only read() and readline(). Also, I think you could get a greater performance improvement by using peek() and avoiding backward seeks. As far as I know, a backward seek may invalidate the internal buffer of the file object, thus forcing a read from disk.
Incidentally, I think it would be nice to add to Python 3 some kind of file protocol as this would provide a way to mitigate the function call overhead for I/O operations.
Author: Antoine Pitrou (pitrou) *
Date: 2009-04-16 21:34
By the way, the patch won't work with unseekable files, which is probably bad.
Author: Mark Lawrence (BreamoreBoy) *
Date: 2010-07-18 21:27
Has this slipped under the radar? I believe that one way or the other any performance issue should be resolved if at all possible.
Author: STINNER Victor (vstinner) *
Date: 2010-07-28 22:58
New version of my patch:
- add "used" attribute to UnpicklerBuffer structure: disable the read buffer for not seekable file and for protocol 0 (at the first call to unpickle_readline)
- check if PyObject_GetAttrString(file, "seek") is NULL or not
- unpickle_readline() flushs also the buffer
- add a new patch specific to the read buffer: ensure that unpickler doesn't eat data at the end of the file
test_pickle pass without any error.
Disable read buffer at the first call to unpickle_readline() because unpickle_readline() have to flush the buffer. I will be very difficult to optimize protocol 0, but I hope that nobody uses it nowadays.
===========
Benchmark with [0]*10**6, Python compiled with pydebug.
Without the patch
Protocol 0:
- dump: 598.0 ms
- load (seekable=False): 3337.3 ms
- load (seekable=True): 3309.6 ms
Protocol 1:
- dump: 217.8 ms
- load (seekable=False): 864.2 ms
- load (seekable=True): 873.3 ms
Protocol 2:
- dump: 226.5 ms
- load (seekable=False): 867.8 ms
- load (seekable=True): 854.6 ms
With the patch
Protocol 0
- dump: 615.5 ms
- load (seekable=False): 3201.3 ms
- load (seekable=True): 3223.4 ms
Protocol 1
- dump: 219.8 ms
- load (seekable=False): 942.1 ms
- load (seekable=True): 175.2 ms
Protocol 2
- dump: 221.1 ms
- load (seekable=False): 943.9 ms
- load (seekable=True): 175.5 ms
Author: STINNER Victor (vstinner) *
Date: 2010-07-28 23:10
Same benchmark with Python 2.6.5+, so without the patch, but compiled with maximum compiler optimization (whereas pydebug means no optimization):
Protocol 0
- dump: 517.3 ms
- load: 876.6 ms <= because of the new I/O library, python3 is much slower
Protocol 1
- dump: 141.8 ms
- load: 255.2 ms <= with my patch, python3 is 1.5x faster
Protocol 2
- dump: 142.7 ms
- load: 262.1 ms <= with my patch, python3 is 1.5x faster
It would be better to redo all tests with the same compiler options, but the most important point is that Python3 is faster than Python2 with my patch ;-)
Author: STINNER Victor (vstinner) *
Date: 2010-07-28 23:11
bench_pickle.py: script used to produce last benchmarks.
Author: Alexandre Vassalotti (alexandre.vassalotti) *
Date: 2010-07-29 05:48
Victor, have you tried using peek() instead of seek()? I mentioned this previously in .
Author: STINNER Victor (vstinner) *
Date: 2010-07-29 16:38
Victor, have you tried using peek() instead of seek()? I mentioned this previously in .
In a file encoded in protocol 0, backward seek are needed to each call to unpickler_readline... and this function is called to read a number, a boolean, etc. (to read most, or all, data values). I choosed to disable the read buffer because it's slower with it.
For protocol 1 and 2, there is only one seek at the end (but a lot of read: file size / 4096). So I don't think that it does really matter to use peek or seek.
seek() is more natural (for me) and frequent than peek(), so I prefer to keep seek().
Author: Antoine Pitrou (pitrou) *
Date: 2010-09-23 22:10
Here is a fixed version of Victor's bench (didn't work on 2.x).
Author: Antoine Pitrou (pitrou) *
Date: 2010-09-23 22:17
And here is new performance patch (Victor's patch was outdated because of heavy changes incorporated from Unladen Swallow). Results of bench_pickle.py are as follows:
- Python 2.7 (cPickle):
Protocol 0
- dump: 189.8 ms
- load (seekable=False): 318.1 ms
- load (seekable=True): 321.2 ms Protocol 1
- dump: 58.2 ms
- load (seekable=False): 89.5 ms
- load (seekable=True): 89.4 ms Protocol 2
- dump: 58.0 ms
- load (seekable=False): 89.1 ms
- load (seekable=True): 89.1 ms
- Python 3.2 (unpatched):
Protocol 0
- dump: 144.5 ms
- load (seekable=False): 530.7 ms
- load (seekable=True): 530.5 ms Protocol 1
- dump: 27.8 ms
- load (seekable=False): 280.4 ms
- load (seekable=True): 279.9 ms Protocol 2
- dump: 27.8 ms
- load (seekable=False): 280.6 ms
- load (seekable=True): 281.9 ms
- Python 3.2 + patch:
Protocol 0
- dump: 146.4 ms
- load (seekable=False): 611.1 ms
- load (seekable=True): 194.2 ms Protocol 1
- dump: 27.3 ms
- load (seekable=False): 254.7 ms
- load (seekable=True): 31.2 ms Protocol 2
- dump: 27.3 ms
- load (seekable=False): 254.6 ms
- load (seekable=True): 31.2 ms
Author: Alexandre Vassalotti (alexandre.vassalotti) *
Date: 2010-09-24 18:12
I get this error with the patch:
python: /home/alex/src/python.org/py3k/Modules/_pickle.c:908: _Unpickler_ReadFromFile: Assertion `self->next_read_idx == 0' failed. Aborted
Author: Antoine Pitrou (pitrou) *
Date: 2010-09-24 18:24
Ah, thank you. I hadn't tested in debug mode and there was a wrong assert from the previous code. Here is a patch with the assert removed.
Author: Antoine Pitrou (pitrou) *
Date: 2010-09-27 20:52
One problem with the seek() approach is that some file-like objects have expensive seeks. One example is GzipFile, where seek(n) is O(n) (it first rewinds to the start of file, then reads n decompressed bytes). In the end, unpickling from a GzipFile becomes O(n**2).
I will try to use peek() instead.
Author: Alexandre Vassalotti (alexandre.vassalotti) *
Date: 2010-09-27 20:58
Didn't Victor say that only one seek at the end is necessary per pickle? If this is the case, I don't think expensive seeks will be an issue.
Author: Antoine Pitrou (pitrou) *
Date: 2010-09-27 22:06
Didn't Victor say that only one seek at the end is necessary per pickle? If this is the case, I don't think expensive seeks will be an issue.
If you are unpickling from a multi-megabyte gzip file and the seek at the end makes you uncompress all the gzip file again, it can certainly be a problem.
Another issue with seeking only at the end is that it would make readline() quite more complicated. And, apparently, readline() is not only used on protocol 0 but also on binary protocols.
Author: Antoine Pitrou (pitrou) *
Date: 2010-09-27 22:11
Here is an update bench_pickle which also makes the file unpeekable.
Author: Antoine Pitrou (pitrou) *
Date: 2010-09-27 22:23
Here is a patch using peek() rather than seek(). There are some inefficiencies around (such as using read() to skip the consumed prefetched bytes), but the benchmark results are still as good as with seek():
Protocol 0
- dump: 142.5 ms
- load (seekable=False): 598.8 ms
- load (seekable=True): 240.2 ms
Protocol 1
- dump: 28.9 ms
- load (seekable=False): 253.2 ms
- load (seekable=True): 25.7 ms
Protocol 2
- dump: 28.9 ms
- load (seekable=False): 253.3 ms
- load (seekable=True): 25.7 ms
Author: Antoine Pitrou (pitrou) *
Date: 2010-10-12 20:52
Patch committed in r85384.
History
Date
User
Action
Args
2022-04-11 14:56:39
admin
set
github: 48123
2011-03-19 03:00:12
jcea
set
nosy: + jcea
2010-10-12 20:52:02
pitrou
set
status: open -> closed
resolution: fixed
messages: +
stage: patch review -> resolved
2010-09-27 22:23:14
pitrou
set
files: + unpickleprefetch3.patch
messages: +
2010-09-27 22:11:48
pitrou
set
files: - bench_pickle.py
2010-09-27 22:11:43
pitrou
set
files: - bench_pickle.py
2010-09-27 22:11:36
pitrou
set
files: + bench_pickle.py
messages: +
2010-09-27 22:06:35
pitrou
set
messages: +
2010-09-27 20:58:06
alexandre.vassalotti
set
messages: +
2010-09-27 20:52:14
pitrou
set
messages: +
2010-09-24 18:25:00
pitrou
set
files: + unpickleprefetch2.patch
messages: +
versions: + Python 3.2, - Python 3.1
2010-09-24 18:12:44
alexandre.vassalotti
set
messages: +
2010-09-23 22:17:44
pitrou
set
files: + unpickleprefetch.patch
messages: +
2010-09-23 22:10:23
pitrou
set
files: + bench_pickle.py
messages: +
2010-07-29 16:46:20
belopolsky
set
nosy: + belopolsky
2010-07-29 16:38:51
vstinner
set
messages: +
2010-07-29 05:48:16
alexandre.vassalotti
set
messages: +
2010-07-28 23:11:19
vstinner
set
files: + bench_pickle.py
messages: +
2010-07-28 23:10:29
vstinner
set
messages: +
2010-07-28 22:58:02
vstinner
set
messages: +
2010-07-18 21:27:28
BreamoreBoy
set
nosy: + BreamoreBoy
messages: +
2009-04-16 21:34:41
pitrou
set
messages: +
2009-04-16 21:33:44
pitrou
set
priority: normal
stage: patch review
versions: + Python 3.1, - Python 3.0
2009-04-08 17:28:05
alexandre.vassalotti
set
messages: +
2009-04-06 16:37:42
collinwinter
set
nosy: + collinwinter
2009-04-06 12:09:02
pitrou
set
messages: +
2009-04-06 12:02:56
vstinner
set
messages: +
2009-04-06 11:56:49
vstinner
set
files: + pickletest.py
messages: +
2009-04-06 11:53:40
vstinner
set
messages: +
2009-04-06 11:52:25
vstinner
set
files: + pickle_read_buffer.patch
keywords: + patch
messages: +
2009-04-05 02:02:12
vstinner
set
messages: +
2009-04-05 01:25:19
vstinner
set
messages: +
2009-04-05 01:23:08
vstinner
set
nosy: + vstinner
messages: +
2009-01-18 14:25:43
pitrou
set
status: closed -> open
resolution: duplicate -> (no value)
superseder: Rewrite the IO stack in C ->
messages: +
2009-01-18 14:17:54
hagen
set
files: + pickletst.py
messages: +
2009-01-18 14:14:12
hagen
set
files: - pickletst.py
2009-01-18 14:01:19
pitrou
set
messages: +
2009-01-18 13:40:36
hagen
set
messages: +
2009-01-18 12:13:00
pitrou
set
status: open -> closed
resolution: duplicate
superseder: Rewrite the IO stack in C
messages: +
2008-12-05 09:21:17
hagen
set
messages: +
2008-10-07 21:12:43
alexandre.vassalotti
set
nosy: + alexandre.vassalotti
messages: +
2008-09-15 17:37:29
pitrou
set
messages: +
2008-09-15 17:24:06
amaury.forgeotdarc
set
nosy: + amaury.forgeotdarc
messages: +
2008-09-15 17:17:23
hagen
set
messages: +
2008-09-15 16:56:46
pitrou
set
nosy: + pitrou
messages: +
2008-09-15 16:37:16
hagen
create