[issue5084] unpickling does not intern attribute names
Jake McGuire
report at bugs.python.org
Tue Jan 27 22:52:20 CET 2009
New submission from Jake McGuire <jake at youtube.com>:
Instance attribute names are normally interned - this is done in
PyObject_SetAttr (among other places). Unpickling (in pickle and
cPickle) directly updates __dict__ on the instance object. This
bypasses the interning so you end up with many copies of the strings
representing your attribute names, which wastes a lot of space, both in
RAM and in pickles of sequences of objects created from pickles. Note
that the native python memcached client uses pickle to serialize
objects.
>>> import pickle
>>> class C(object):
... def __init__(self, x):
... self.long_attribute_name = x
...
>>> len(pickle.dumps([pickle.loads(pickle.dumps(C(None),
pickle.HIGHEST_PROTOCOL)) for i in range(100)],
pickle.HIGHEST_PROTOCOL))
3658
>>> len(pickle.dumps([C(None) for i in range(100)],
pickle.HIGHEST_PROTOCOL))
1441
>>>
Interning the strings on unpickling makes the pickles smaller, and at
least for cPickle actually makes unpickling sequences of many objects
slightly faster. I have included proposed patches to cPickle.c and
pickle.py, and would appreciate any feedback.
----------
components: Library (Lib)
files: cPickle.c.diff
keywords: patch
messages: 80670
nosy: jakemcguire
severity: normal
status: open
title: unpickling does not intern attribute names
type: resource usage
Added file: http://bugs.python.org/file12879/cPickle.c.diff
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5084>
_______________________________________
More information about the Python-bugs-list
mailing list