[Python-Dev] Unpickling memory usage problem, and a proposed solution (original) (raw)
Collin Winter collinwinter at google.com
Fri Apr 23 23🔞13 CEST 2010
- Previous message: [Python-Dev] Unpickling memory usage problem, and a proposed solution
- Next message: [Python-Dev] Unpickling memory usage problem, and a proposed solution
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Apr 23, 2010 at 1:53 PM, Alexandre Vassalotti <alexandre at peadrop.com> wrote:
On Fri, Apr 23, 2010 at 3:57 PM, Dan Gindikin <dgindikin at gmail.com> wrote:
This wouldn't help our use case, your code needs the entire pickle stream to be in memory, which in our case would be about 475mb, this is on top of the 300mb+ data structures that generated the pickle stream.
In that case, the best we could do is a two-pass algorithm to remove the unused PUTs. That won't be efficient, but it will satisfy the memory constraint. Another solution is to not generate the PUTs at all by setting the 'fast' attribute on Pickler. But that won't work if you have a recursive structure, or have code that requires that the identity of objects to be preserved.
I don't think it's possible in general to remove any PUTs if the pickle is being written to a file-like object. It is possible to reuse a single Pickler to pickle multiple objects: this causes the Pickler's memo dict to be shared between the objects being pickled. If you pickle foo, bar, and baz, foo may not have any GETs, but bar and baz may have GETs that reference data added to the memo by foo's PUT operations. Because you can't know what will be written to the file-like object later, you can't remove any of the PUT instructions in this scenario.
This kind of thing is done in real-world code like cvs2svn (which I broke when I was optimizing cPickle; don't break cvs2svn, it's not fun to fix :). I added some basic tests for this support in cPython's Lib/test/pickletester.py.
There might be room for app-specific optimizations that do this, but I'm not sure it would work for a general-usage cPickle that needs to stay compatible with the current system.
Collin Winter
- Previous message: [Python-Dev] Unpickling memory usage problem, and a proposed solution
- Next message: [Python-Dev] Unpickling memory usage problem, and a proposed solution
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]