(original) (raw)

On Sat, Jun 23, 2012 at 3:19 AM, M Stefan <mstefanro@gmail.com> wrote:

* UNION_FROZENSET: like UPDATE_SET, but create a new frozenset

stack before: ... pyfrozenset mark stackslice

stack after : ... pyfrozenset.union(stackslice)

Since frozenset are immutable, could you explain how adding the UNION_FROZENSET opcode helps in pickling self-referential frozensets? Or are you only adding this one to follow the current style used for pickling dicts and lists in protocols 1 and onward?

While this design allows pickling of self-referenti/Eal sets, self-referential
frozensets are still problematic. For instance, trying to pickle \`fs':
a=A(); fs=frozenset(\[a\]); a.fs = fs
(when unpickling, the object a has to be initialized before it is added to
the frozenset)

The only way I can think of to make this work is to postpone
the initialization of all the objects inside the frozenset until after UNION\_FROZENSET.
I believe this is doable, but there might be memory penalties if the approach
is to simply store all the initialization opcodes in memory until pickling the frozenset is finished.

I don't think that's the only way. You could also emit POP opcode to discard the frozenset from stack and then emit a GET to fetch it back from the memo. This is how we currently handle self-referential tuples. Check out the save\_tuple method in pickle.py to see how it is done. Personally, I would prefer that approach because it already well-tested and proven to work.

That said, your approach sounds good too. The memory trade-off could lead to smaller pickles and more efficient decoding (though these self-referential objects are rare enough that I don't think that any improvements there would matter much).

While self-referential frozensets are uncommon, a far more problematic
situation is with the self-referential objects created with REDUCE. While
pickle uses the idea of creating empty collections and then filling them,
reduce tipically creates already-filled objects. For instance:
cnt = collections.Counter(); cnt\[a\]=3; a.cnt=cnt; cnt.\_\_reduce\_\_()
(<class 'collections.Counter'>, ({<\_\_main\_\_.A object at 0x0286E8F8>: 3},))
where the A object contains a reference to the counter. Unpickling an
object pickled with this reduce function is not possible, because the reduce
function, which "explains" how to create the object, is asking for the object
to exist before being created.

Your example seems to work on Python 3\. I am not sure if I follow what you are trying to say. Can you provide a working example?

$ python3

Python 3.1.2 (r312:79147, Dec 9 2011, 20:47:34)

[GCC 4.4.3] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> import pickle, collections

>>> c = collections.Counter()

>>> class A: pass

...

>>> a = A()

>>> c\[a\] = 3

>>> a.cnt = c

>>> b =pickle.loads(pickle.dumps(a))

>>> b in b.cnt

True

Pickle could try to fix this by detecting when reduce returns a class type as the first tuple arg and move the dict ctor parameter to the state, but this may not always be intended. It's also a bit strange that __getstate__ is never used anywhere in pickle directly.

I would advise against any such change. The reduce protocol is already fairly complex. Further I don't think change it this way would give us any extra flexibility.

The documentation has a good explanation of how \_\_getstate\_\_ works under hood:

http://docs.python.org/py3k/library/pickle.html#pickling-class-instances

And if you need more, PEP 307 (http://www.python.org/dev/peps/pep-0307/) provides some of the design rationales of the API.