[Python-Dev] Pickler/Unpickler API clarification

Antoine Pitrou solipsis at pitrou.net
Fri Mar 6 15:05:55 CET 2009


Le vendredi 06 mars 2009 à 13:44 +0100, Michael Haggerty a écrit :
> Antoine Pitrou wrote:
> > Michael Haggerty <mhagger <at> alum.mit.edu> writes:
> >> It is easy to optimize the pickling of instances by giving them
> >> __getstate__() and __setstate__() methods.  But the pickler still
> >> records the type of each object (essentially, the name of its class) in
> >> each record.  The space for these strings constituted a large fraction
> >> of the database size.
> > 
> > If these strings are not interned, then perhaps they should be.
> > There is a similar optimization proposal (w/ patch) for attribute names:
> > http://bugs.python.org/issue5084
> 
> If I understand correctly, this would not help:
> 
> - on writing, the strings are identical anyway, because they are read
> out of the class's __name__ and __module__ fields.  Therefore the
> Pickler's usual memoizing behavior will prevent the strings from being
> written more than once.

Then why did you say that "the space for these strings constituted a
large fraction of the database size", if they are already shared? Are
your objects so tiny that even the space taken by the pointer to the
type name grows the size of the database significantly?




More information about the Python-Dev mailing list