[Python-Dev] Unpickling py2 str as py3 bytes (and vice versa) (original) (raw)

[Python-Dev] Unpickling py2 str as py3 bytes (and vice versa) - implementation (issue #6784)

Merlijn van Deen valhallasw at arctus.nl
Tue Mar 13 12:44:58 CET 2012


http://bugs.python.org/issue6784 ("byte/unicode pickle incompatibilities between python2 and python3")

Hello all,

Currently, pickle unpickles python2 'str' objects as python3 'str' objects, where the encoding to use is passed to the Unpickler. However, there are cases where it makes more sense to unpickle a python2 'str' as python3 'bytes' - for instance when it is actually binary data, and not text.

Currently, the mapping is as follows, when reading a pickle: python2 'str' -> python3 'str' (using an encoding supplied to Unpickler) python2 'unicode' -> python3 'str'

or, when creating a pickle using protocol <= 2: python3 'str' -> python2 'unicode' python3 'bytes' -> python2 'builtins.bytes object'

This issue suggests to add a flag to change the behaviour as follows: a) python2 'str' -> python3 'bytes' b) python3 'bytes' -> python2 'str'

The question on this is how to pass this flag. To quote Antoine (with permission) on my mail about this issue on core-mentorship:

I haven't answered because I'm unsure about the approach itself - do we want to add yet another argument to pickle methods, especially this late in the 3.x development cycle?

Currently, I have implemented it using an extra argument for the Pickler and Unpickler objects ('bytestr'), which toggles the behaviour. I.e.:

pickled = Pickler(data, bytestr=True); unpickled = Unpickler(data, bytestr=True). This is the approach used in pickle_bytestr.patch [1]

Another option would be to implement a seperate Pickler/Unpickler object, such that

pickled = BytestrPickler(data, bytestr=True); unpickled = BytestrUnpickler(data, bytestr=True) This is the approach I initially implemented [2].

Alternatively, there is the option only to implement the Unpickler, leaving the Pickler as it is. This allows

unpickled = Unpickler(data, encoding=bytes) where the bytes type is used as a special 'flag'.

And, of course, there is the option not to implement this in the stdlib at all.

What are your ideas on this?

Best, Merlijn

[0] http://bugs.python.org/issue6784 [1] http://bugs.python.org/file24719/pickle_bytestr.patch [2] https://github.com/valhallasw/py2/blob/master/bytestrpickle.py



More information about the Python-Dev mailing list