Pickling limitation with instances defining __cmp__/__hash__?

Erik Max Francis max at alcyone.com
Mon Jun 27 21:13:46 EDT 2005


I've come across a limitation in unpickling certain types of complex 
data structures which involve instances that override __hash__, and was 
wondering if it was known (basic searches didn't seem to come up with 
anything similar) and if there is a workaround for it short of 
restructuring the data structures in question.

The fundamental issue rests with defining classes which override __cmp__ 
and __hash__ in order to be used as keys in dictionaries (and elements 
of sets).  __cmp__ and __hash__ are defined to manipulate a single 
attribute of the class, which never changes for the lifetime of an 
object.  In a simplified form:

	class C:
	
	    def __init__(self, x):
	        self.x = x
	
	    def __cmp__(self, other):
	        return cmp(self.x, other.x)
	
	    def __hash__(self):
	        return hash(self.x)

Even if C contains other members which are manipulated, making it 
technically mutable, since the one attribute (in this example, x) which 
is used for __cmp__ and __hash__ is never changed after the creation of 
the object, it is legal to use as a dictionary key.  (Formally, the 
atrribute in question is a name which is guaranteed to be unique.)

The difficulty arises when the data structures that are built up in C 
contain a circular reference to itself as a dictionary key.  In my 
particular case the situation is rather involved, but the simplest 
example which reproduces the problem (using C) would be:

	c = C(1)
	c.m = {c: '1'}

So far this is fine and behaves as expected.  Pickling the object c 
results in no problems.  Unpickling it, however, results in an error:

	data = pickle.dumps(c)
	d = pickle.loads(data) # line 25

Traceback (most recent call last):
   File "/home/max/tmp/hash.py", line 25, in ?
     d = pickle.loads(data)
   File "/usr/local/lib/python2.4/pickle.py", line 1394, in loads
     return Unpickler(file).load()
   File "/usr/local/lib/python2.4/pickle.py", line 872, in load
     dispatch[key](self)
   File "/usr/local/lib/python2.4/pickle.py", line 1218, in load_setitem
     dict[key] = value
   File "/home/max/tmp/hash.py", line 15, in __hash__
     return hash(self.x)
AttributeError: C instance has no attribute 'x'

By poking around, one can see that the error is occurring because the 
unpickler algorithm is trying to use the instance as a key in a 
dictionary before the instance has been completely initialized (in fact, 
the __dict__ of this object is the empty dictionary!).

The error happens regardless of whether pickle or cPickle is used (so I 
used pickle to give a more meaningful traceback above), nor whether the 
protocol is 0 or HIGHEST_PROTOCOL.

Is this issue known?  I don't see any mention of this kind of 
circularity in the Python Library Reference 3.14.4.  Second, is there 
any reasonably straightforward workaround to this limitation, short of 
reworking things so that these self-referenced objects aren't used as 
dictionary keys?

-- 
Erik Max Francis && max at alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
   You'll learn / Life is worth it / Watch the tables turn
   -- TLC



More information about the Python-list mailing list