[Python-Dev] Increasing the C-optimized pickle extensibility (original) (raw)

Guido van Rossum guido at python.org
Fri Apr 26 11:15:02 EDT 2019


I think it's better not to introduce a new opcode, for the reason you stated -- you don't want your pickles to be unreadable by older Python versions, if you can help it.

On Fri, Apr 26, 2019 at 5:59 AM Pierre Glaser <pierre.glaser at inria.fr> wrote:

Hi All,

We (Antoine Pitrou, Olivier Grisel and myself) spent some efforts recently on enabling pickle extensions to extend the C-optimized Pickler instead of the pure Python one. Pickle extensions have a crucial role in many distributed computing libraries: cloudpickle (https://github.com/cloudpipe/cloudpickle) for example is vendored in dask, pyspark, ray, and joblib. Early benchmarks show that relying on the C-optimized pickle yields significant serialization speed improvements (up to 30x faster). (draft PR of the CPickler-backed version of cloudpickle: https://github.com/cloudpipe/cloudpickle/pull/253) To make extending the C Pickler possible, we are currently moving forward with a few enhancements to the public pickle API. * First, we are enabling Pickler subclasses to implement a reduceroverride method, that will be have priority over the registered reducers in the dispatchtable and over the default handling of classes and functions. (PR link: https://github.com/python/cpython/pull/12499) * Then, we are adding a new keyword argument to savereduce called statesetter. (consequently we allow a reducer's return value to have a new, 6th item). This state setter callable is useful to override programmatically the state updating behavior of an object, that would otherwise be restricted to its static _setstate_ method. (PR link: https://github.com/python/cpython/pull/12588) The PR review process of these changes is in progress, and anyone is welcomed to chime in and share some thoughts. The first addition is very non-invasive. We estimated that the second point did not require introducing a new opcode, as this change could be implemented as simple sequence of standard pickle instructions. We therefore think that it is not necessary to make this change dependent on the new protocol 5 proposed in PEP 574. The key advantage in not creating a new opcode that this makes our change backward-compatible, meaning that 3.8-written pickles will not break because of our change if read using earlier Python versions. OTOH, one might argue that a new OPCODE might * make the code a little bit cleaner * make it easier to interpret disassembled pickle strings. If you are interested, here is an example of a disassembled pickle string using our currently proposed solution: https://github.com/pierreglaser/cpython/pull/2#issuecomment-486243350 Does anyone have an opinion on this? Thanks, Pierre


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20190426/102aea43/attachment.html>



More information about the Python-Dev mailing list