[Python-Dev] Unpickling memory usage problem, and a proposed solution (original) (raw)

Alexandre Vassalotti alexandre at peadrop.com
Fri Apr 23 22:53:52 CEST 2010


On Fri, Apr 23, 2010 at 3:57 PM, Dan Gindikin <dgindikin at gmail.com> wrote:

This wouldn't help our use case, your code needs the entire pickle stream to be in memory, which in our case would be about 475mb, this is on top of the 300mb+ data structures that generated the pickle stream.

In that case, the best we could do is a two-pass algorithm to remove the unused PUTs. That won't be efficient, but it will satisfy the memory constraint. Another solution is to not generate the PUTs at all by setting the 'fast' attribute on Pickler. But that won't work if you have a recursive structure, or have code that requires that the identity of objects to be preserved.

import io, pickle x=[1,2] f = io.BytesIO() p = pickle.Pickler(f, protocol=-1) p.dump([x,x]) pickletools.dis(f.getvalue()) 0: \x80 PROTO 2 2: ] EMPTY_LIST 3: q BINPUT 0 5: ( MARK 6: ] EMPTY_LIST 7: q BINPUT 1 9: ( MARK 10: K BININT1 1 12: K BININT1 2 14: e APPENDS (MARK at 9) 15: h BINGET 1 17: e APPENDS (MARK at 5) 18: . STOP highest protocol among opcodes = 2 [id(x) for x in pickle.loads(f.getvalue())] [20966504, 20966504]

Now with the 'fast' mode enabled:

f = io.BytesIO() p = pickle.Pickler(f, protocol=-1) p.fast = True p.dump([x,x]) pickletools.dis(f.getvalue()) 0: \x80 PROTO 2 2: ] EMPTY_LIST 3: ( MARK 4: ] EMPTY_LIST 5: ( MARK 6: K BININT1 1 8: K BININT1 2 10: e APPENDS (MARK at 5) 11: ] EMPTY_LIST 12: ( MARK 13: K BININT1 1 15: K BININT1 2 17: e APPENDS (MARK at 12) 18: e APPENDS (MARK at 3) 19: . STOP highest protocol among opcodes = 2 [id(x) for x in pickle.loads(f.getvalue())] [20966504, 21917992]

As you can observe, the pickle stream generated with the fast mode might actually be bigger.

By the way, it is weird that the total memory usage of the data structure is smaller than the size of its respective pickle stream. What pickle protocol are you using?

-- Alexandre



More information about the Python-Dev mailing list