[Python-checkins] python/nondist/peps pep-0307.txt,1.9,1.10
gvanrossum@users.sourceforge.net
gvanrossum@users.sourceforge.net
Tue, 04 Feb 2003 11:12:28 -0800
Update of /cvsroot/python/python/nondist/peps
In directory sc8-pr-cvs1:/tmp/cvs-serv26330
Modified Files:
pep-0307.txt
Log Message:
Introduce extension codes.
Index: pep-0307.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0307.txt,v
retrieving revision 1.9
retrieving revision 1.10
diff -C2 -d -r1.9 -r1.10
*** pep-0307.txt 4 Feb 2003 17:53:55 -0000 1.9
--- pep-0307.txt 4 Feb 2003 19:12:25 -0000 1.10
***************
*** 489,492 ****
--- 489,568 ----
+ The extension registry
+
+ Protocol 2 supports a new mechanism to reduce the size of pickles.
+
+ When class instances (classic or new-style) are pickled, the full
+ name of the class (module name including package name, and class
+ name) is included in the pickle. Especially for applications that
+ generate many small pickles, this is a lot of overhead that has to
+ be repeated in each pickle. For large pickles, when using
+ protocol 1, repeated references to the same class name are
+ compressed using the "memo" feature; but each class name must be
+ spelled in full at least once per pickle, and this causes a lot of
+ overhead for small pickles.
+
+ The extension registry allows one to represent the most frequently
+ used names by small integers, which are pickled very efficiently:
+ an extension code in the range 1-255 requires only two bytes
+ including the opcode, one in the range 256-65535 requires only
+ three bytes including the opcode.
+
+ One of the design goals of the pickle protocol is to make pickles
+ "context-free": as long as you have installed the modules
+ containing the classes referenced by a pickle, you can unpickle
+ it, without needing to import any of those classes ahead of time.
+
+ Unbridled use of extension codes could jeopardize this desirable
+ property of pickles. Therefore, the main use of extension codes
+ is reserved for a set of codes to be standardized by some
+ standard-setting body. This being Python, the standard-setting
+ body is the PSF. From time to time, the PSF will decide on a
+ table mapping extension codes to class names (or occasionally
+ names of other global objects; functions are also eligible). This
+ table will be incorporated in the next Python release(s).
+
+ However, for some applications, like Zope, context-free pickles
+ are not a requirement, and waiting for the PSF to standardize
+ some codes may not be practical. Two solutions are offered for
+ such applications.
+
+ First of all, a few ranges of extension codes is reserved for
+ private use. Any application can register codes in these ranges.
+ Two applications exchanging pickles using codes in these ranges
+ need to have some out-of-band mechanism to agree on the mapping
+ between extension codes and names.
+
+ Second, some large Python projects (e.g. Zope or Twisted) can be
+ assigned a range of extension codes outside the "private use"
+ range that they can assign as they see fit.
+
+ The extension registry is defined as a mapping between extension
+ codes and names. When an extension code is unpickled, it ends up
+ producing an object, but this object is gotten by interpreting the
+ name as a module name followed by a class (or function) name. The
+ mapping from names to objects is cached. It is quite possible
+ that certain names cannot be imported; that should not be a
+ problem as long as no pickle containing a reference to such names
+ has to be unpickled. (The same issue already exists for direct
+ references to such names in pickles that use protocols 0 or 1.)
+
+ Here is the proposed initial assigment of extension code ranges:
+
+ First Last Count Purpose
+
+ 0 0 1 Reserved -- will never be used
+ 1 127 127 Reserved for Python standard library
+ 128 191 64 Reserved for Zope 3
+ 192 239 48 Reserved for 3rd parties
+ 240 255 16 Reserved for private use (will never be assigned)
+ 256 Max Max Reserved for future assignment
+
+ 'Max' stands for 2147483647, or 2**31-1. This is a hard
+ limitation of the protocol as currently defined.
+
+ At the moment, no specific extension codes have been assigned yet.
+
+
TBD