[Python-Dev] On a new version of pickle [PEP 3154]: self-referential frozensets (original) (raw)

M Stefan mstefanro at gmail.com
Sat Jun 23 12:19:05 CEST 2012


Hello,

I'm one of this year's Google Summer of Code students working on improving pickle by creating a new version. My name is Stefan and my mentor is Alexandre Vassalotti.

If you're interested, you can monitor the progress in the dedicated blog at [2] and the bitbucket repository at [3].

One of the goals for picklev4 is to add native opcodes for pickling of sets and frozensets. Currently these 4 opcodes were added:

While this design allows pickling of self-referential sets, self-referential frozensets are still problematic. For instance, trying to pickle `fs': a=A(); fs=frozenset([a]); a.fs = fs (when unpickling, the object a has to be initialized before it is added to the frozenset)

The only way I can think of to make this work is to postpone the initialization of all the objects inside the frozenset until after UNION_FROZENSET. I believe this is doable, but there might be memory penalties if the approach is to simply store all the initialization opcodes in memory until pickling the frozenset is finished.

Currently, pickle.dumps(fs,4) generates: EMPTY_FROZENSET BINPUT 0 MARK BINGLOBAL_COMMON '0 A' # same as GLOBAL 'main A' in v3 EMPTY_TUPLE NEWOBJ EMPTY_DICT SHORT_BINUNICODE 'fs' BINGET 0 # retrieves the frozenset which is empty at this point, and it # will never be filled because it's immutable SETITEM BUILD # a.setstate({'fs' : frozenset()}) UNION_FROZENSET By postponing the initialization of a, it should instead generate: EMPTY_FROZENSET BINPUT 0 MARK BINGLOBAL_COMMON '0 A' # same as GLOBAL 'main A' in v3 EMPTY_TUPLE NEWOBJ # create the object but don't initialize its state yet BINPUT 1 UNION_FROZENSET BINGET 1 EMPTY_DICT SHORT_BINUNICODE 'fs' BINGET 0 SETITEM BUILD POP

While self-referential frozensets are uncommon, a far more problematic situation is with the self-referential objects created with REDUCE. While pickle uses the idea of creating empty collections and then filling them, reduce tipically creates already-filled objects. For instance: cnt = collections.Counter(); cnt[a]=3; a.cnt=cnt; cnt.reduce() (<class 'collections.Counter'>, ({<__main__.A object at 0x0286E8F8>: 3},)) where the A object contains a reference to the counter. Unpickling an object pickled with this reduce function is not possible, because the reduce function, which "explains" how to create the object, is asking for the object to exist before being created. The fix here would be to pass Counter's dictionary in the state argument, as opposed to the "constructor parameters" one, as follows: (<class 'collections.Counter'>, (), {<__main__.A object at 0x0286E8F8>: 3}) When unpickling this, an empty Counter will be created first, and then setstate will be called to fill it, at which point self-references are allowed. I assume this modification has to be done in the implementations of the data structures rather than in pickle itself. Pickle could try to fix this by detecting when reduce returns a class type as the first tuple arg and move the dict ctor parameter to the state, but this may not always be intended. It's also a bit strange that getstate is never used anywhere in pickle directly.

I'm looking forward to hearing your suggestions and opinions in this matter.

Regards, Stefan

[1] http://www.python.org/dev/peps/pep-3154/ [2] http://pypickle4.wordpress.com/ [3] http://bitbucket.org/mstefanro/pickle4



More information about the Python-Dev mailing list