[issue5084] unpickling does not intern attribute names

Jake McGuire report at bugs.python.org
Tue Jan 27 22:52:20 CET 2009


New submission from Jake McGuire <jake at youtube.com>:

Instance attribute names are normally interned - this is done in 
PyObject_SetAttr (among other places).  Unpickling (in pickle and 
cPickle) directly updates __dict__ on the instance object.  This 
bypasses the interning so you end up with many copies of the strings 
representing your attribute names, which wastes a lot of space, both in 
RAM and in pickles of sequences of objects created from pickles.  Note 
that the native python memcached client uses pickle to serialize 
objects.

>>> import pickle
>>> class C(object):
...   def __init__(self, x):
...     self.long_attribute_name = x
...
>>> len(pickle.dumps([pickle.loads(pickle.dumps(C(None), 
pickle.HIGHEST_PROTOCOL)) for i in range(100)], 
pickle.HIGHEST_PROTOCOL))
3658
>>> len(pickle.dumps([C(None) for i in range(100)], 
pickle.HIGHEST_PROTOCOL))
1441
>>>

Interning the strings on unpickling makes the pickles smaller, and at 
least for cPickle actually makes unpickling sequences of many objects 
slightly faster.  I have included proposed patches to cPickle.c and 
pickle.py, and would appreciate any feedback.

----------
components: Library (Lib)
files: cPickle.c.diff
keywords: patch
messages: 80670
nosy: jakemcguire
severity: normal
status: open
title: unpickling does not intern attribute names
type: resource usage
Added file: http://bugs.python.org/file12879/cPickle.c.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5084>
_______________________________________


More information about the Python-bugs-list mailing list