[Python-checkins] peps: Add draft for PEP 3154, "Pickle protocol version 4"

antoine.pitrou python-checkins at python.org
Thu Aug 11 20:12:39 CEST 2011


http://hg.python.org/peps/rev/074a90b5bcbf
changeset:   3920:074a90b5bcbf
user:        Antoine Pitrou <solipsis at pitrou.net>
date:        Thu Aug 11 20:10:41 2011 +0200
summary:
  Add draft for PEP 3154, "Pickle protocol version 4"

files:
  pep-3154.txt |  107 +++++++++++++++++++++++++++++++++++++++
  1 files changed, 107 insertions(+), 0 deletions(-)


diff --git a/pep-3154.txt b/pep-3154.txt
new file mode 100644
--- /dev/null
+++ b/pep-3154.txt
@@ -0,0 +1,107 @@
+PEP: 3154
+Title: Pickle protocol version 4
+Version: $Revision$
+Last-Modified: $Date$
+Author: Antoine Pitrou <solipsis at pitrou.net>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 2011-08-11
+Python-Version: 3.3
+Post-History:
+Resolution: TBD
+
+
+Abstract
+========
+
+Data serialized using the pickle module must be portable accross Python
+versions.  It should also support the latest language features as well as
+implementation-specific features.  For this reason, the pickle module knows
+about several protocols (currently numbered from 0 to 3), each of which
+appeared in a different Python version.  Using a low-numbered protocol
+version allows to exchange data with old Python versions, while using a
+high-numbered protocol allows access to newer features and sometimes more
+efficient resource use (both CPU time required for (de)serializing, and
+disk size / network bandwidth required for data transfer).
+
+
+Rationale
+=========
+
+The latest current protocol, coincidentally named protocol 3, appeared with
+Python 3.0 and supports the new incompatible features in the language
+(mainly, unicode strings by default and the new bytes object).  The
+opportunity was not taken at the time to improve the protocol in other ways.
+
+This PEP is an attempt to foster a number of small incremental improvements
+in a future new protocol version.  The PEP process is used in order to gather
+as many improvements as possible, because the introduction of a new protocol
+version should be a rare occurrence.
+
+
+Improvements in discussion
+==========================
+
+64-bit compatibility for large objects
+--------------------------------------
+
+Current protocol versions export object sizes for various built-in types
+(str, bytes) as 32-bit ints.  This forbids serialization of large data [1]_.
+New opcodes are required to support very large bytes and str objects.
+
+Native opcodes for sets and frozensets
+--------------------------------------
+
+Many common built-in types (such as str, bytes, dict, list, tuple) have
+dedicated opcodes to improve resource consumption when serializing and
+deserializing them; however, sets and frozensets don't.  Adding such opcodes
+would be an obvious improvements.  Also, dedicated set support could help
+remove the current impossibility of pickling self-referential sets
+[2]_.
+
+Binary encoding for all opcodes
+-------------------------------
+
+The GLOBAL opcode, which is still used in protocol 3, uses the so-called
+"text" mode of the pickle protocol, which involves looking for newlines
+in the pickle stream.  Looking for newlines is difficult to optimize on
+a non-seekable stream, and therefore a new version of GLOBAL (BINGLOBAL?)
+could use a binary encoding instead.
+
+It seems that all other opcodes emitted when using protocol 3 already use
+binary encoding.
+
+
+
+Acknowledgments
+===============
+
+(...)
+
+
+References
+==========
+
+.. [1] "pickle not 64-bit ready":
+   http://bugs.python.org/issue11564
+
+.. [2] "Cannot pickle self-referencing sets":
+   http://bugs.python.org/issue9269
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End:

-- 
Repository URL: http://hg.python.org/peps


More information about the Python-checkins mailing list