Issue 8738: cPickle dumps(tuple) != dumps(loads(dumps(tuple))) (original) (raw)

Created on 2010-05-17 10:45 by Alberto.Planas.Domínguez, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (5)
msg105896 - (view) Author: Alberto Planas Domínguez (Alberto.Planas.Domínguez) Date: 2010-05-17 10:45
Sometimes, when I use cPickle to serialize tuples of strings, I get different dumps() result for the same tuple: import cPickle t = ('', 'JOHN') s1 = cPickle.dumps(t) s2 = cPickle.dumps(cPickle.loads(cPickle.dumps(t))) assert s1 == s2 # AssertionError With cPickle doesn't matter what protocol use por dumps(). The assertion is Ok if I use the pickle module instead of cPickle. This means that I can't use a serialized object as a key in a map/dict object.
msg105898 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-17 10:50
I don't think you can expect serialized results to always be equal. It can depend on specifics of the internal algorithm, such as optimizations or dict iteration order.
msg110345 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-15 04:58
There seems to be a bug somewhere in 2.x cPickle. Here is a somewhat simpler way to demonstrate the bug: the following code from pickletools import dis import cPickle t = 1L, # use long for easy 3.x comparison s1 = cPickle.dumps(t) s2 = cPickle.dumps(cPickle.loads(s1)) print(s1 == s2) dis(s1) dis(s2) prints False 0: ( MARK 1: L LONG 1L 5: t TUPLE (MARK at 0) 6: p PUT 1 9: . STOP highest protocol among opcodes = 0 0: ( MARK 1: L LONG 1L 5: t TUPLE (MARK at 0) 6: . STOP highest protocol among opcodes = 0 The difference is probably immaterial because nothing in the pickle uses the tuple again and PUT is redundant, but the difference does not show up when python pickle module is used instead of cPickle and is not present in py3k. The comparable py3k code: from pickletools import dis import pickle t = 1, s1 = pickle.dumps(t, 0) s2 = pickle.dumps(pickle.loads(s1), 0) print(s1 == s2) dis(s1) dis(s2) produces True 0: ( MARK 1: L LONG 1 5: t TUPLE (MARK at 0) 6: p PUT 0 9: . STOP highest protocol among opcodes = 0 0: ( MARK 1: L LONG 1 5: t TUPLE (MARK at 0) 6: p PUT 0 9: . STOP highest protocol among opcodes = 0 Most likely the bug is benign and not worth fixing, but I would like to figure out what's going on and what changed in 3.x.
msg110347 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-15 05:22
OK, the 2.7 behavior is explainable and correct. cPickle checks the reference count and does not generate PUT for objects that don't have references: >>> from pickletools import dis >>> from cPickle import dumps >>> dis(dumps(tuple([1]))) 0: ( MARK 1: I INT 1 4: t TUPLE (MARK at 0) 5: . STOP highest protocol among opcodes = 0 >>> t = 1, >>> dis(dumps(t)) 0: ( MARK 1: I INT 1 4: t TUPLE (MARK at 0) 5: p PUT 1 8: . STOP highest protocol among opcodes = 0 This optimization is not available from python, of course so pickle.py behaves differently. The remaining question is why this optimization was removed from 3.x.
msg110348 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-15 05:50
I am speculating here while Alexandre probably knows the answer. The skip PUT on unreferenced objects optimization was probably removed because doing so makes _pickle module behave more like pickle and because pickletools now has optimize method which can provide a more thorough removal of unused unused PUT opcodes. Closing as "invalid".
History
Date User Action Args
2022-04-11 14:57:01 admin set github: 52984
2010-07-15 05:50:20 belopolsky set status: open -> closedversions: + Python 2.7, - Python 3.2messages: + resolution: not a bugstage: resolved
2010-07-15 05:22:54 belopolsky set messages: + versions: + Python 3.2, - Python 2.7
2010-07-15 04:58:56 belopolsky set versions: - Python 2.6nosy: + belopolskymessages: + assignee: belopolsky
2010-05-17 10:50:30 pitrou set priority: normal -> lowversions: + Python 2.7nosy: + alexandre.vassalotti, pitroumessages: +
2010-05-17 10:45:17 Alberto.Planas.Domínguez create