(original) (raw)

On Fri, Apr 23, 2010 at 11:11, Dan Gindikin <dgindikin@gmail.com> wrote:

We were having performance problems unpickling a large pickle file, we were

getting 170s running time (which was fine), but 1100mb memory usage. Memory

usage ought to have been about 300mb, this was happening because of memory

fragmentation, due to many unnecessary "puts" in the pickle stream.

We made a pickletools.optimize inspired tool that could run directly on a

pickle file and used pickletools.genops. This solved the unpickling problem

(84s, 382mb).

However the tool itself was using too much memory and time (1100s, 470mb), so

I recoded it to scan through the pickle stream directly, without going through

pickletools.genops, giving (240s, 130mb).

Other people that deal with large pickle files are probably having similar

problems, and since this comes up when dealing with large data it is precisely

in this situation that you probably can't use pickletools.optimize or

pickletools.genops. It feels like functionality that ought to be added to

pickletools, is there some way I can contribute this?

The best next step is to open an issue at bugs.python.org and upload the patch. I can't make any guarantees on when someone will look at it or if it will get accepted, but putting the code there is your best bet for acceptance.

-Brett