[Python-Dev] Pickler/Unpickler API clarification

Michael Haggerty mhagger at alum.mit.edu
Sat Mar 7 19:22:44 CET 2009


Guido van Rossum wrote:
> On Sat, Mar 7, 2009 at 8:04 AM, Michael Haggerty <mhagger at alum.mit.edu> wrote:
>> Typically, the purpose of a database is to persist data across program
>> runs.  So typically, your suggestion would only help if there were a way
>> to persist the primed Pickler across runs.
> 
> I haven't followed all this, but isn't is at least possible to
> conceive of the primed pickler as being recreated from scratch from
> constant data each run?

If there were a guarantee that pickling the same data would result in
the same memo ID -> object mapping, that would also work.  But that
doesn't seem to be a realistic guarantee to make.  AFAIK the memo IDs
are integers chosen consecutively in the order that objects are pickled,
which doesn't seem so bad.  But compound objects are a problem.  For
example, when pickling a map, the map entries would have to be pickled
in an order that remains consistent across runs (and even across Python
versions).  Even worse, all user-written __getstate__() methods would
have to return exactly the same result, even across program runs.

>> (The primed Unpickler is not quite so important because it can be primed
>> by reading a pickle of the primer, which in turn can be stored somewhere
>> in the DB.)
>>
>> In the particular case of cvs2svn, each of our databases is in fact
>> written in a single pass, and then in later passes only read, not
>> written.  So I suppose we could do entirely without pickleable Picklers,
>> if they were copyable within a single program run.  But that constraint
>> would make the feature even less general.
> 
> Being copyable is mostly equivalent to being picklable, but it's
> probably somewhat weaker because it's easier to define it as a pointer
> copy for some types that aren't easily picklable.

Indeed.  And pickling the memo should not present any fundamental
problems, since by construction it can only contain pickleable objects.

Michael


More information about the Python-Dev mailing list