[Python-Dev] The memo of pickle

Guido van Rossum guido@python.org
Fri, 09 Aug 2002 00:04:45 -0400


> The slowdown of text-mode pickle is due to the extremely expensive way
> of unpickling pickled strings in text-mode: it invokes eval() (well,
> PyRun_String()) to parse the string literal!  (After checking that
> there's really only a string literal there to prevent trojan horses.)

After re-reading the quoted thread, there was another phenomenon
remarked upon there: the slow text-mode pickle used less memory.  I
noticed this too when I ran the test program.  The explanation is that
the strings in the test program were "key0", "key1", ... "key24" and
"value0" ... "value24", over and over (each test dict has the same
keys and values).  Because these literals look like identifiers, they
are interned, so the unpickled data structure shares the string
references -- while the original test data has 10,000 copies of each
string!

If we really want this as a feature, a call to
PyString_InternFromString() could be made under certain conditions in
load_short_binstring() (e.g. when the length is at most 10 and
all_name_chars() from compile.c returns true).

I'm not sure that this is a desirable feature though.

--Guido van Rossum (home page: http://www.python.org/~guido/)