[Python-Dev] Pickler/Unpickler API clarification

Guido van Rossum guido at python.org
Thu Mar 5 22:10:32 CET 2009


On Thu, Mar 5, 2009 at 12:07 PM, Collin Winter <collinw at gmail.com> wrote:
> I'm working on some performance patches for cPickle, and one of the
> bigger wins so far has been replacing the Pickler's memo dict with a
> custom hashtable (and hence removing memo's getters and setters). In
> looking over this, Jeffrey Yasskin commented that this would break
> anyone who was accessing the memo attribute.
>
> I've found a few examples of code using the memo attribute ([1], [2],
> [3]), and there are probably more out there, but the memo attribute
> doesn't look like part of the API to me. It's only documented in
> http://docs.python.org/library/pickle.html as "you used to need this
> before Python 2.3, but don't anymore". However: I don't believe you
> should ever need this attribute.
>
> The usages of memo I've seen break down into two camps: clearing the
> memo, and wanting to explicitly populate the memo with predefined
> values. Clearing the memo is recommended as part of reusing Pickler
> objects, but I can't fathom when you would want to reuse a Pickler
> *without* clearing the memo. Reusing the Pickler without clearing the
> memo will produce pickles that are, as best I can see, invalid -- at
> least, pickletools.dis() rejects this, which is the closest thing we
> have to a validator.

I can explain this, as I invented this behavior. The use case was to
have a long-lived session between a client and a server which were
communicating repeatedly using pickles. The idea was that values that
had been transferred once wouldn't have to be sent across the wire
again -- they could just be referenced.

This was a bad idea (*), and I'd be happy to ban it -- but we'd
probably have to bump the pickle protocol version in order to maintain
backwards compatibility.

> Explicitly setting memo values has the same
> problem: an easy, very brittle way to produce invalid data.

Agreed this is just preposterous. It was never part of the plan.

> So my questions are these:
> 1) Should Pickler/Unpickler objects automatically clear their memos
> when dumping/loading?

Alas, there could be backwards compatibility issues. Bumping the
protocol could mitigate this.

> 2) Is memo an intentionally exposed, supported part of the
> Pickler/Unpickler API, despite the lack of documentation and tests?

The exposition is unintentional but for historic reasons we can't just
remove it. :-(

> Thanks,
> Collin
>
> [1] - http://google.com/codesearch/p?hl=en#Qx8E-7HUBTk/trunk/google/appengine/api/memcache/__init__.py&q=lang:py%20%5C.memo
> [2] - http://google.com/codesearch/p?hl=en#M-DDI-lCOgE/lib/python2.4/site-packages/cvs2svn_lib/primed_pickle.py&q=lang:py%20%5C.memo
> [3] - http://google.com/codesearch/p?hl=en#l_w_cA4dKMY/AtlasAnalysis/2.0.3-LST-1/PhysicsAnalysis/PyAnalysis/PyAnalysisUtils/python/root_pickle.py&q=lang:py%20pick.*%5C.memo%5Cb

__________
(*) http://code.google.com/p/googleappengine/issues/detail?id=417

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list