PEP 285 and pickle compatibility

Paul Rubin phr-n2002a at nightsong.com
Fri Apr 5 07:30:25 EST 2002


Here's a kludge that might steer around PEP 285 causing pickle
incompatibility between new and old python versions.  The issue
is if you pickle the value of 1==1, in new versions it will get
pickled as something that comes back as a bool rather than an int,
and the obvious method (adding a new pickle code for bool objects)
will cause breakage if you try to read the pickle with an old
version of python that doesn't know about bools.

Proposed fix: in new pickles, dump boolean True as "I01" and boolean
False as "I00".  The current unpickler should load these as the integer
values 1 and 0 and everything should work fine.  But these strings
will never be generated by the current pickler, which wouldn't emit
the leading 0's.  

In the new unpickler, put a special hack into the load_int routines
in cPickle.c and pickle.py, that notice the leading 0 and push the
appropriate bool value.

I think the ugliness of this kludge is outweighed by escaping making
an incompatible change to the pickle format.  Sooner or later, though,
a real incompatible change will be needed, so I'd like to propose
a further change now, that shouldn't cause compatibility probs:

The current pickle dump/dumps functions currently take a "bin" arg
specifying whether to dump in text or binary format.  Unfortunately,
this arg is treated as a bool and the dumps routine only cares whether
it's true or false.  So I think a further optional arg should be
added, giving a "format code" saying what unpickler versions have
to be able to read the pickle being created:

  dump(object, format=0)  # same as dump(object), text format pickle
  dump(object, format=1)  # same as dump(object,bin=1), binary format
  dump(object, format=2)  # new, incompatible format

Pickles created with format=2 might for example actually contain a
bool code instead of the I00/I01 hack, so old unpicklers couldn't read them.

If a future pickler/unpickler introduces an incompatible format, one
feature I'd hope to see added is efficient binary pickling of long
ints.  Currently they're pickled as ascii decimal strings (even in
binary pickles), which are very slow to convert to and from (all that
arithmetic).

Paul



More information about the Python-list mailing list