[Python-Dev] The memo of pickle

Guido van Rossum guido@python.org
Thu, 08 Aug 2002 23:55:30 -0400


Martin quoted a complaint about cPickle performance:

http://groups.google.de/groups?hl=en&lr=&ie=UTF-8&selm=mailman.1026940226.16076.python-list%40python.org

But if you read the full thread, it's clear that this complaint came
about because the author wasn't using binary pickle mode.  In binary
mode his times became acceptable.  I've run the test and I haven't
seen abnormal memory behavior -- the process grows to 26 Mb just to
create the test data, and then adds about 1 Mb during pickling.  The
loading almost doubles the process size, because another copy of the
test data is read (the test data isn't thrown away).

The slowdown of text-mode pickle is due to the extremely expensive way
of unpickling pickled strings in text-mode: it invokes eval() (well,
PyRun_String()) to parse the string literal!  (After checking that
there's really only a string literal there to prevent trojan horses.)

So I'm not sure that the memo size is worth pursuing.  I didn't look
at the other complaints referenced by Martin, but I bet they're more
of the same.

What might be worth looking into:

(1) Make binary pickling the default (in cPickle as well as pickle).
    This would definitely give the most bang for the buck.

(2) Replace the PyRun_String() call in cPickle with something faster.
    Maybe the algorithm from parsestr() from compile.c can be exposed;
    although the error reporting must be done differently then.

--Guido van Rossum (home page: http://www.python.org/~guido/)