[Python-Dev] Unpickling py2 str as py3 bytes (and vice versa) - implementation (issue #6784) (original) (raw)

Michael Foord fuzzyman at voidspace.org.uk
Tue Mar 13 20:42:20 CET 2012


On 13 Mar 2012, at 04:44, Merlijn van Deen wrote:

http://bugs.python.org/issue6784 ("byte/unicode pickle incompatibilities between python2 and python3")

Hello all, Currently, pickle unpickles python2 'str' objects as python3 'str' objects, where the encoding to use is passed to the Unpickler. However, there are cases where it makes more sense to unpickle a python2 'str' as python3 'bytes' - for instance when it is actually binary data, and not text. Currently, the mapping is as follows, when reading a pickle: python2 'str' -> python3 'str' (using an encoding supplied to Unpickler) python2 'unicode' -> python3 'str' or, when creating a pickle using protocol <= 2:_ _python3 'str' -> python2 'unicode' python3 'bytes' -> python2 'builtins.bytes object'

It does seem unfortunate that by default it is impossible for a developer to "do the right thing" as regards pickling / unpickling here. Binary data on Python 2 being unpickled as Unicode on Python 3 is presumably for the convenience of developers doing the wrong thing (and only works for ascii anyway).

All the best,

Michael Foord

This issue suggests to add a flag to change the behaviour as follows: a) python2 'str' -> python3 'bytes' b) python3 'bytes' -> python2 'str'

The question on this is how to pass this flag. To quote Antoine (with permission) on my mail about this issue on core-mentorship:

I haven't answered because I'm unsure about the approach itself - do we want to add yet another argument to pickle methods, especially this late in the 3.x development cycle? Currently, I have implemented it using an extra argument for the Pickler and Unpickler objects ('bytestr'), which toggles the behaviour. I.e.:

pickled = Pickler(data, bytestr=True); unpickled = Unpickler(data, bytestr=True). This is the approach used in picklebytestr.patch [1] Another option would be to implement a seperate Pickler/Unpickler object, such that pickled = BytestrPickler(data, bytestr=True); unpickled = BytestrUnpickler(data, bytestr=True) This is the approach I initially implemented [2]. Alternatively, there is the option only to implement the Unpickler, leaving the Pickler as it is. This allows unpickled = Unpickler(data, encoding=bytes) where the bytes type is used as a special 'flag'. And, of course, there is the option not to implement this in the stdlib at all. What are your ideas on this? Best, Merlijn [0] http://bugs.python.org/issue6784 [1] http://bugs.python.org/file24719/picklebytestr.patch [2] https://github.com/valhallasw/py2/blob/master/bytestrpickle.py


Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk

-- http://www.voidspace.org.uk/

May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html



More information about the Python-Dev mailing list