[Python-Dev] Unpickling memory usage problem, and a proposed solution (original) (raw)
Alexandre Vassalotti alexandre at peadrop.com
Fri Apr 23 20:38:16 CEST 2010
- Previous message: [Python-Dev] Unpickling memory usage problem, and a proposed solution
- Next message: [Python-Dev] Unpickling memory usage problem, and a proposed solution
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Apr 23, 2010 at 2:11 PM, Dan Gindikin <dgindikin at gmail.com> wrote:
We were having performance problems unpickling a large pickle file, we were getting 170s running time (which was fine), but 1100mb memory usage. Memory usage ought to have been about 300mb, this was happening because of memory fragmentation, due to many unnecessary "puts" in the pickle stream.
We made a pickletools.optimize inspired tool that could run directly on a pickle file and used pickletools.genops. This solved the unpickling problem (84s, 382mb). However the tool itself was using too much memory and time (1100s, 470mb), so I recoded it to scan through the pickle stream directly, without going through pickletools.genops, giving (240s, 130mb).
Collin Winter wrote a simple optimization pass for cPickle in Unladen Swallow 1. The code reads through the stream and remove all the unnecessary PUTs in-place.
Other people that deal with large pickle files are probably having similar problems, and since this comes up when dealing with large data it is precisely in this situation that you probably can't use pickletools.optimize or pickletools.genops. It feels like functionality that ought to be added to pickletools, is there some way I can contribute this?
Just put your code on bugs.python.org and I will take a look.
-- Alexandre
- Previous message: [Python-Dev] Unpickling memory usage problem, and a proposed solution
- Next message: [Python-Dev] Unpickling memory usage problem, and a proposed solution
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]