| msg186705 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-04-13 11:41 |
| Python 2 allows pickling and unpickling non-ascii persistent ids. In Python 3 C implementation of pickle saves persistent ids with protocol version 0 as utf8-encoded strings and loads as bytes. >>> import pickle, io >>> class MyPickler(pickle.Pickler): ... def persistent_id(self, obj): ... if isinstance(obj, str): ... return obj ... return None ... >>> class MyUnpickler(pickle.Unpickler): ... def persistent_load(self, pid): ... return pid ... >>> f = io.BytesIO(); MyPickler(f).dump('\u20ac'); data = f.getvalue() >>> MyUnpickler(io.BytesIO(data)).load() '€' >>> f = io.BytesIO(); MyPickler(f, 0).dump('\u20ac'); data = f.getvalue() >>> MyUnpickler(io.BytesIO(data)).load() b'\xe2\x82\xac' >>> f = io.BytesIO(); MyPickler(f, 0).dump('a'); data = f.getvalue() >>> MyUnpickler(io.BytesIO(data)).load() b'a' Python implementation in Python 3 doesn't works with non-ascii persistant ids at all. |
|
|
| msg186789 - (view) |
Author: Alexandre Vassalotti (alexandre.vassalotti) *  |
Date: 2013-04-13 18:35 |
| In protocol 0, the persistent ID is restricted to alphanumeric strings because of the problems that arise when the persistent ID contains newline characters. _pickle likely should be changed to use the ASCII decoded. And perhaps, we should check for embedded newline characters too. |
|
|
| msg186816 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-04-13 20:07 |
| Even for alphanumeric strings Python 3 have a bug. It saves strings and load bytes objects. |
|
|
| msg186881 - (view) |
Author: Alexandre Vassalotti (alexandre.vassalotti) *  |
Date: 2013-04-14 04:01 |
| Here's a patch that fix the bug. |
|
|
| msg186894 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-04-14 08:33 |
| I think a string with character codes < 256 will be better for test_protocol0_is_ascii_only(). It can be latin1 encoded (Python 2 allows any 8-bit strings). PyUnicode_AsASCIIString() can be slower than _PyUnicode_AsStringAndSize() (actually PyUnicode_AsUTF8AndSize()) because the latter can use cached value. You can check if the persistent id only contains ASCII characters by checking PyUnicode_GET_LENGTH(pid_str) == size. And what are you going to do with the fact that in Python 2 you can pickle non-ascii persistent ids, which will not be able to unpickle in Python 3? |
|
|
| msg235881 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-02-13 08:32 |
| The patch is updated to current sources. Also optimized writing ASCII strings and fixed tests. |
|
|
| msg268851 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-06-19 12:03 |
| Ping. |
|
|
| msg269874 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-07-06 09:31 |
| Ping again. |
|
|
| msg270619 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2016-07-17 08:36 |
| New changeset f6a41552a312 by Serhiy Storchaka in branch '3.5': Issue #17711: Fixed unpickling by the persistent ID with protocol 0. https://hg.python.org/cpython/rev/f6a41552a312 New changeset df8857c6f3eb by Serhiy Storchaka in branch 'default': Issue #17711: Fixed unpickling by the persistent ID with protocol 0. https://hg.python.org/cpython/rev/df8857c6f3eb |
|
|