Issue 5180: 3.1 cannot unpickle 2.7-created pickle (original) (raw)

Created on 2009-02-07 22:28 by pitrou, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (25)

msg81351 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2009-02-07 22:28

The pickle was generated by pybench. You can try to display it by running:

./python Tools/pybench/pybench.py --debug -s 27.bench

It fails with the following traceback:

Traceback (most recent call last): File "Tools/pybench/pybench.py", line 954, in PyBenchCmdline() File "/home/antoine/py3k/svn/Tools/pybench/CommandLine.py", line 349, in init rc = self.main() File "Tools/pybench/pybench.py", line 888, in main bench = pickle.load(f) File "/home/antoine/py3k/svn/Lib/pickle.py", line 1335, in load return Unpickler(file, encoding=encoding, errors=errors).load() _pickle.UnpicklingError: bad pickle data

msg81352 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2009-02-07 22:35

Here is the pickle file.

msg108768 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-06-27 01:25

If I disable _pickle, I get a more meaningful trace:

File "Tools/pybench/pybench.py", line 954, in PyBenchCmdline() File "/Users/sasha/Work/python-svn/py3k-commit/Tools/pybench/CommandLine.py", line 349, in init rc = self.main() File "Tools/pybench/pybench.py", line 888, in main bench = pickle.load(f) File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1321, in load encoding=encoding, errors=errors).load() File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 830, in load dispatchkey[0] File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1055, in load_inst klass = self.find_class(module, name) File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1115, in find_class import(module, level=0) File "/Users/sasha/Work/python-svn/py3k-commit/Tools/pybench/Unicode.py", line 17 s = unicode(u''.join(map(str,range(100)))) ^ SyntaxError: invalid syntax

Apparently, pybench/Unicode.py has never been converted to py3k. This is likely not the only problem:

from pybench import * import pickle pickle.load(open('27.bench')) Traceback (most recent call last): File "", line 1, in File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1321, in load encoding=encoding, errors=errors).load() File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 829, in load assert isinstance(key, bytes_types) AssertionError

This looks like the infamous bytes vs string problem ...

msg108769 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-06-27 01:31

The bytes/string issu was a red herring: with pickle.load(open('27.bench', 'b')), I get the same stack trace as from command line pybench invocation.

pickle.load(open('27.bench', 'rb')) Traceback (most recent call last): File "", line 1, in File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1321, in load encoding=encoding, errors=errors).load() File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 830, in load dispatchkey[0] File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1055, in load_inst klass = self.find_class(module, name) File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1115, in find_class import(module, level=0) File "/Users/sasha/Work/python-svn/py3k-commit/Tools/pybench/Unicode.py", line 17 s = unicode(u''.join(map(str,range(100)))) ^ SyntaxError: invalid syntax

msg108771 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-06-27 01:47

Hmm. It looks like another pickle vs. _pickle issue. Attached patch is a result of 2to3 applied to Tools/pybench (and a minor manual fix for pickle import) with _pickle import disabled.

With the patch applied,

$ ./python Tools/pybench/pybench.py --debug -s 27.bench

runs fine and produces results identical to those for a similar run from the trunk.

Looks like it is time to fire up gdb.

msg108796 - (view)

Author: Marc-Andre Lemburg (lemburg) * (Python committer)

Date: 2010-06-27 16:34

Please keep the Python3 version of pybench compatible to Python 2.6 and 2.7.

msg108799 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-06-27 17:22

I have found which deals with running pybench. I suggest that we reopen that issue and move discussion of .diff there.

I really like the idea to keep single source for 2.x and 3.x pybench, but we need to add some machinery to run 2.x pybench under python3.x transparently.

Marc-Andre,

are you taking over the pybench or the _pickle issue?

msg108812 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-06-28 00:43

It looks like I was able to get to the root of the problem. I am attaching two files that demonstrate the issue:

==> pickle-bug.py <== import pickle import sys class Bug: pass bug = Bug() f = open(sys.argv[1], 'w') pickle.Pickler(f, protocol=0).dump(bug) f.close()

==> unpickle-bug.py <== import pickle import sys class Bug: pass bug = pickle._Unpickler(open(sys.argv[1], 'rb')).load() # works print(bug) bug = pickle.Unpickler(open(sys.argv[1], 'rb')).load() # doesn't print(bug)

$ python2 pickle-bug.py /tmp/bug.pkl $ python3 unpickle-bug.py /tmp/bug.pkl <__main__.Bug object at 0x1006b6f40> Traceback (most recent call last): File "unpickle-bug.py", line 7, in bug = pickle.Unpickler(open(sys.argv[1], 'rb')).load() # doesn't _pickle.UnpicklingError: bad pickle data

The problematic pickle is really small, so I hope I'll report that I have a fix soon.

0: (    MARK
1: i        INST       '__main__ Bug' (MARK at 0)

15: p PUT 0 18: ( MARK 19: d DICT (MARK at 18) 20: p PUT 1 23: b BUILD 24: . STOP highest protocol among opcodes = 0

msg108814 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-06-28 01:27

As promised, here is the fix (-pickle.diff) for the "unpickle-bug.py" issue. Unfortunately, it looks like more bugs remain. 27.bench is still not loadable.

msg108815 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-06-28 02:11

The remaining bug is a bit harder. If you try to run unpickle-bug-2.py on the same pickle, you get

$ python3 unpickle-bug-2.py /tmp/bug.pkl <__main__.Bug object at 0x1006a8f40> Traceback (most recent call last): File "unpickle-bug-2.py", line 9, in bug = pickle.Unpickler(open(sys.argv[1], 'rb')).load() # doesn't TypeError: ('init() takes exactly 2 arguments (1 given)', <class '__main__.Bug'>, ())

The problem is acknowledged in the source code of instantiate function in _pickle.c:

/* XXX: The pickle.py module does not create instances this way when the                                             
   args tuple is empty. See Unpickler._instantiate(). */

Clearly, this is wrong because pickle.py way succeeds where _pickle does not.

PS: I did not mean to unassign MAL, but I think this is more of alexandre.vassalotti's issue.

msg108817 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-06-28 03:00

I am attaching a patch which makes python3 read 27.bench without errors. I think this should be applied while a complete solution for unpickling old style class instances from text mode (protocol = 0) pickles is found.

msg108819 - (view)

Author: Marc-Andre Lemburg (lemburg) * (Python committer)

Date: 2010-06-28 10:20

Alexander Belopolsky wrote:

Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:

I am attaching a patch which makes python3 read 27.bench without errors. I think this should be applied while a complete solution for unpickling old style class instances from text mode (protocol = 0) pickles is found.

Please use a different strategy for "porting" the Unicode tests to Python3:

Create a new test modules Bytes (which reuses the Strings tests) and a new test module Strings3 which reuses the Unicode tests. The Strings3 module should then use the patches you applied to the Unicode module.

Finally, let the two test module Strings and Unicode fail with an ImportError in Python3.

msg108821 - (view)

Author: Marc-Andre Lemburg (lemburg) * (Python committer)

Date: 2010-06-28 11:30

Alexander Belopolsky wrote:

Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:

The bytes/string issu was a red herring: with pickle.load(open('27.bench', 'b')), I get the same stack trace as from command line pybench invocation.

pickle.load(open('27.bench', 'rb')) Traceback (most recent call last): File "", line 1, in File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1321, in load encoding=encoding, errors=errors).load() File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 830, in load dispatchkey[0] File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1055, in load_inst klass = self.find_class(module, name) File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1115, in find_class import(module, level=0) File "/Users/sasha/Work/python-svn/py3k-commit/Tools/pybench/Unicode.py", line 17 s = unicode(u''.join(map(str,range(100)))) ^ SyntaxError: invalid syntax

Note that pybench is written in a way that it can handle ImportErrors and SyntaxErrors gracefully. This is per design, since test modules can well be written for more recent Python versions that support a certain features or syntax not present in earlier versions.

In the above case, it may be a good idea to redirect pickle to the new Strings3 module (see my other message). Likewise, the Strings test module should redirect to the new Bytes test module in Python3.

The Unicode and Strings modules should not be loaded in Python3. Instead, the Bytes and Strings3 modules should be used. This can be changed in Setup.py of pybench.

msg108846 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-06-28 17:20

I am attaching a patch which focuses on fixing _pickle behavior. I opened a separate to deal with pybench specific problems.

Marc-Andre,

I am reassigning this issue to myself and assigning to you. I hope you don't mind.

My patch attempts to emulate 2.x PyInstance_NewRaw with a call to tp_alloc. I considered using tp_new, but although new is not special for classic classes, there is nothing to prevent users to define new in their classic classes or add new when they port these classes to 3.x. (BTW, I wonder what 2to3 does if classic class has new defined.) This is slightly unorthodox use of tp_alloc, so I will follow up with a question on python-dev.

msg108866 - (view)

Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer)

Date: 2010-06-28 20:58

Thank you for the nice investigative work!

I will try my best to review this patch by next week.

msg108879 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-06-28 23:32

My patch attempts to emulate 2.x PyInstance_NewRaw with a call to tp_alloc.

This is certainly the wrong thing to do. You could at least try PyBaseObject_Type.tp_new (see object_new() in typeobject.c), but even this is not necessarily right (for example if the class is derived from an extension type defining its own tp_new).

So, IMO, the right thing to do would be to choose the first base type that isn't a Python-defined class and use its tp_new:

staticbase = subtype;
while (staticbase && (staticbase->tp_flags & Py_TPFLAGS_HEAPTYPE))
    staticbase = staticbase->tp_base;

(these two lines are from tp_new_wrapper() in typeobject.c)

That way you choose the right constructor function, yet don't call the Python-defined new function. That's the only reasonable way of doing it I can imagine. It also follows the following behaviour:

class C(int): ... def new(cls, *a): ... print "new", a ... return int.new(cls, *a) ... C(5) new (5,) 5 s = pickle.dumps(C(5)) new (5,) x = pickle.loads(s) x 5

As you can see, int.new has been called on unpickling but not C.new (no print).

msg108880 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-06-28 23:34

(and to follow on my example, I'd point that the unpickled instance is still an instance of the right class:

type(x) <class '__main__.C'>

)

msg108885 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-06-29 01:45

On Mon, Jun 28, 2010 at 7:32 PM, Antoine Pitrou <report@bugs.python.org> wrote:

My patch attempts to emulate 2.x PyInstance_NewRaw with a call to tp_alloc.

This is certainly the wrong thing to do. You could at least try PyBaseObject_Type.tp_new (see object_new() in typeobject.c), but even this is not necessarily right (for example if the class is derived from an extension type defining its own tp_new).

Well, this is not so obviously wrong as you make it sound. If you look closer at object_new(), you will see that in sane situations it reduces to type->tp_alloc(type, 0) call. Remember, this is called in very special circumstances: when unpickling old style class instances from a 2.x generated pickle. No arguments are available for tp_new call, so excess_args check does not apply. The other check is for an ABC, but the situation where an old-style class becomes an ABC in 2.x to 3.x transition is hardly sane. If such sanity check is necessary, I would rather check the flag explicitly in _pickle.c rather than having to generate dummy args and kwds to satisfy tp_new signature.

So, IMO, the right thing to do would be to choose the first base type that isn't a Python-defined class and use its tp_new:

   staticbase = subtype;    while (staticbase && (staticbase->tp_flags & Py_TPFLAGS_HEAPTYPE))        staticbase = staticbase->tp_base;

(these two lines are from tp_new_wrapper() in typeobject.c)

This looks like overengineering to me. I think 3.x should simply refuse to unpickle old style class instances into anything that is not subclassed in python directly from object. It would be helpful at this point if you could provide a test case where tp_alloc logic does not work. Regardless of what algorithm we ultimately choose, I would like to include such test in regrtest.

PS: Note that in my patch I've eliminated one control path that was calling instantiate(..) erroneously, but I have not reviewed the code carefully enough to eliminate the possibility that instantiate(..) would be called to instantiate new-style class instances.

PPS: What is you opinion on the pickle.py trick of monkey patching class on an instance of an empty class? Do you think this can be fooled?

msg108897 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-06-29 08:36

Well, this is not so obviously wrong as you make it sound. If you look closer at object_new(), you will see that in sane situations it reduces to type->tp_alloc(type, 0) call.

Today, yes. But tomorrow it may entail additional operations.

If such sanity check is necessary, I would rather check the flag explicitly in _pickle.c rather than having to generate dummy args and kwds to satisfy tp_new signature.

I find it better to "generate dummy args and kwds to satisfy tp_new signature". I see no point in a partial inlining of an internal function when we could call it instead.

So, IMO, the right thing to do would be to choose the first base type that isn't a Python-defined class and use its tp_new:

staticbase = subtype; while (staticbase && (staticbase->tp_flags & Py_TPFLAGS_HEAPTYPE)) staticbase = staticbase->tp_base;

(these two lines are from tp_new_wrapper() in typeobject.c)

This looks like overengineering to me. I think 3.x should simply refuse to unpickle old style class instances into anything that is not subclassed in python directly from object.

Right, but it would lead to almost the same code... Since you have to find the first non-Python base type and check that it is "object".

It would be helpful at this point if you could provide a test case where tp_alloc logic does not work.

I'll try to find one.

PPS: What is you opinion on the pickle.py trick of monkey patching class on an instance of an empty class? Do you think this can be fooled?

I don't think so.

msg110383 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-07-15 17:50

Antoine,

I think I have found a better solution. Since we are dealing with classic classes, they should not define new. If in porting to 3.x, users introduce new that requires arguments, they will not be able to unpickle 2.x pickles no matter what we do.

Therefore, I propose to replace _EmptyClass trick in pickle.py with a call to klass.new:

and do the same in _pickle.c's instantiate:

r = PyObject_CallMethod(cls, "new", "O", cls);

Note that I am not even calling tp_new here directly, so all the hoops in tp_new_wrapper get jumped through.

The advantage of this scheme is that python and C code will work exactly the same and users that have corner cases for which the scheme does not work will be able to figure out what is going on by looking at the python code.

Attaching issue5180a.diff

msg110536 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-07-17 02:03

Antoine said on IRC that he is ok with the latest approach. Does anyone want to review the patch before it goes in?

msg110541 - (view)

Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer)

Date: 2010-07-17 07:52

I have fixed some style nits in your patch. It would be nice to have tests for the different control paths in instantiate(). But, I realize that would be a bit annoying.

Apart from that, the patch looks good.

msg110559 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-07-17 14:28

Alexandre,

I am not sure your change form PyObject_Size(args) to Py_SIZE(args) is correct. As far as I can tell, args come from pickle machine stack without any type checks. The equivalent code in 2.x cPickle uses PyObject_Size and checks for errors. I'll try to come up with a test case for the error situation, but it is a bit tricky because pickle should be produced in 2.x in order for this function to be used.

msg110592 - (view)

Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer)

Date: 2010-07-17 18:27

The args argument is always a tuple created with Pdata_poptuple(). You can add an explicit type check. If this check fails a RuntimeError should be raised, because this would indicate a programming error in pickle.

msg110613 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-07-17 23:02

Committed issue5180b.diff with minor additions in r82937 (r82938 for 3.1):

  1. Added an assert that args is a tuple in instantiate() with an explanation why it is true.

  2. Added a test for the other code branch instantiate. (Not all conditions are tested, but I think this is now good enough.)

The explanation of INST and OBJ opcodes in pickletools is now out of date, so I am adding #9267 as a superseder.

History

Date

User

Action

Args

2022-04-11 14:56:45

admin

set

github: 49430

2010-07-17 23:02:54

belopolsky

set

status: open -> closed
resolution: accepted -> fixed
messages: +

superseder: Update pickle opcode documentation in pickletools for 3.x
stage: commit review -> resolved

2010-07-17 18:27:04

alexandre.vassalotti

set

messages: +

2010-07-17 14:28:19

belopolsky

set

messages: +

2010-07-17 07:52:35

alexandre.vassalotti

set

files: + issue5180b.diff

messages: +

2010-07-17 02:03:33

belopolsky

set

resolution: accepted
messages: +
stage: patch review -> commit review

2010-07-15 18:14:04

belopolsky

link

issue9267 dependencies

2010-07-15 17:50:03

belopolsky

set

files: + issue5180a.diff

messages: +

2010-06-29 08:36:47

pitrou

set

messages: +

2010-06-29 01:45:41

belopolsky

set

messages: +

2010-06-28 23:34:50

pitrou

set

messages: +

2010-06-28 23:32:39

pitrou

set

messages: +

2010-06-28 20:58:18

alexandre.vassalotti

set

messages: +

2010-06-28 18:38:48

belopolsky

set

stage: test needed -> patch review

2010-06-28 18:38:03

belopolsky

set

files: - issue5180-fix.diff

2010-06-28 18:37:50

belopolsky

set

files: - issue5180.diff

2010-06-28 17:20:53

belopolsky

set

files: + issue5180.diff

nosy: + mark.dickinson
messages: +

assignee: lemburg -> belopolsky

2010-06-28 16:08:22

belopolsky

link

issue9102 dependencies

2010-06-28 11:30:30

lemburg

set

messages: +

2010-06-28 10:20:27

lemburg

set

messages: +

2010-06-28 03:00:03

belopolsky

set

files: + issue5180-fix.diff

messages: +

2010-06-28 02:11:11

belopolsky

set

files: + unpickle-bug-2.py
assignee: lemburg
messages: +

2010-06-28 01:27:09

belopolsky

set

files: + issue5180-pickle.diff

messages: +
stage: test needed

2010-06-28 00:43:37

belopolsky

set

files: + unpickle-bug.py

2010-06-28 00:43:09

belopolsky

set

files: + pickle-bug.py
assignee: lemburg -> (no value)
messages: +

2010-06-27 17:22:49

belopolsky

set

messages: +

2010-06-27 16:34:22

lemburg

set

messages: +

2010-06-27 15:04:19

georg.brandl

set

assignee: lemburg

nosy: + lemburg

2010-06-27 01:48:02

belopolsky

set

files: + issue5180.diff
keywords: + patch
messages: +

2010-06-27 01:31:27

belopolsky

set

messages: +

2010-06-27 01:25:21

belopolsky

set

nosy: + belopolsky
messages: +

2010-05-11 20:45:37

terry.reedy

set

versions: + Python 3.2, - Python 3.0

2009-02-07 22:35:24

pitrou

set

files: + 27.bench
messages: +

2009-02-07 22:28:53

pitrou

create