[Python-Dev] PEP 3154 - pickle protocol 4 (original) (raw)
Antoine Pitrou solipsis at pitrou.net
Fri Aug 12 12:58:46 CEST 2011
- Previous message: [Python-Dev] Status of the PEP 400? (deprecate codecs.StreamReader/StreamWriter)
- Next message: [Python-Dev] PEP 3154 - pickle protocol 4
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello,
This PEP is an attempt to foster a number of small incremental improvements in a future pickle protocol version. The PEP process is used in order to gather as many improvements as possible, because the introduction of a new protocol version should be a rare occurrence.
Feel free to suggest any additions.
Regards
Antoine.
http://www.python.org/dev/peps/pep-3154/
PEP: 3154 Title: Pickle protocol version 4 Version: RevisionRevisionRevision Last-Modified: DateDateDate Author: Antoine Pitrou <solipsis at pitrou.net> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2011-08-11 Python-Version: 3.3 Post-History: Resolution: TBD
Abstract
Data serialized using the pickle module must be portable accross Python versions. It should also support the latest language features as well as implementation-specific features. For this reason, the pickle module knows about several protocols (currently numbered from 0 to 3), each of which appeared in a different Python version. Using a low-numbered protocol version allows to exchange data with old Python versions, while using a high-numbered protocol allows access to newer features and sometimes more efficient resource use (both CPU time required for (de)serializing, and disk size / network bandwidth required for data transfer).
Rationale
The latest current protocol, coincidentally named protocol 3, appeared with Python 3.0 and supports the new incompatible features in the language (mainly, unicode strings by default and the new bytes object). The opportunity was not taken at the time to improve the protocol in other ways.
This PEP is an attempt to foster a number of small incremental improvements in a future new protocol version. The PEP process is used in order to gather as many improvements as possible, because the introduction of a new protocol version should be a rare occurrence.
Improvements in discussion
64-bit compatibility for large objects
Current protocol versions export object sizes for various built-in types (str, bytes) as 32-bit ints. This forbids serialization of large data [1]_. New opcodes are required to support very large bytes and str objects.
Native opcodes for sets and frozensets
Many common built-in types (such as str, bytes, dict, list, tuple) have dedicated opcodes to improve resource consumption when serializing and deserializing them; however, sets and frozensets don't. Adding such opcodes would be an obvious improvement. Also, dedicated set support could help remove the current impossibility of pickling self-referential sets [2]_.
Binary encoding for all opcodes
The GLOBAL opcode, which is still used in protocol 3, uses the so-called "text" mode of the pickle protocol, which involves looking for newlines in the pickle stream. Looking for newlines is difficult to optimize on a non-seekable stream, and therefore a new version of GLOBAL (BINGLOBAL?) could use a binary encoding instead.
It seems that all other opcodes emitted when using protocol 3 already use binary encoding.
Acknowledgments
(...)
References
.. [1] "pickle not 64-bit ready": http://bugs.python.org/issue11564
.. [2] "Cannot pickle self-referencing sets": http://bugs.python.org/issue9269
Copyright
This document has been placed in the public domain.
.. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
- Previous message: [Python-Dev] Status of the PEP 400? (deprecate codecs.StreamReader/StreamWriter)
- Next message: [Python-Dev] PEP 3154 - pickle protocol 4
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]