dictionary interface

Tue Oct 4 20:18:14 EDT 2005

On Tue, 4 Oct 2005, Robert Kern wrote:

> Antoon Pardon wrote:
>
>>   class Tree:
>>
>>     def __lt__(self, term):
>>       return set(self.iteritems()) < set(term.iteritems())
>>
>>     def __eq__(self, term):
>>       return set(self.iteritems()) == set(term.iteritems())
>>
>> Would this be a correct definition of the desired behaviour?
>
> No.
>
> In [1]: {1:2} < {3:4}
> Out[1]: True
>
> In [2]: set({1:2}.iteritems()) < set({3:4}.iteritems())
> Out[2]: False
>
>> Anyone a reference?
>
> The function dict_compare in dictobject.c .

Well there's a really helpful answer. I'm intrigued, Robert - since you 
know the real answer to this question, why did you choose to tell the 
Antoon that he was wrong, not tell him in what way he was wrong, certainly 
not tell him how to be right, but just tell him to read the source, rather 
than simply telling him what you knew? Still, at least you told him which 
file to look in. And if he knows python but not C, or gets lost in the 
byzantine workings of the interpreter, well, that's his own fault, i 
guess.

So, Antoon, firstly, your implementation of __eq__ is, i believe, correct.

Your implementation of __lt__ is, sadly, not. While sets take "<" to mean 
"is a proper subset of", for dicts, it's a more conventional comparison 
operation, which constitutes a total ordering over all dicts (so you can 
sort with it, for example). However, since dicts don't really have a 
natural total ordering, it is ever so slightly arbitrary.

The rules for ordering on dicts are, AFAICT:

- If one dict has fewer elements than the other, it's the lesser
- If not, find the smallest key for which the two dicts have different 
values (counting 'not present' as a value)
-- If there is no such key, the dicts are equal
-- If the key is present in one dict but not the other, the dict in which 
it is present is the lesser
-- Otherwise, the dict in which the value is lesser is itself the lesser

In code:

def dict_cmp(a, b):
 	diff = cmp(len(a), len(b))
 	if (diff != 0):
 		return diff
 	for key in sorted(set(a.keys() + b.keys())):
 		if (key not in a):
 			return 1
 		if (key not in b):
 			return -1
 		diff = cmp(a[key], b[key])
 		if (diff != 0):
 			return diff
 	return 0

I assume your tree has its items sorted by key value; that means there's 
an efficient implementation of this using lockstep iteration over the two 
trees being compared.

Another way of looking at it is in terms of list comparisons: comparing 
two dicts is the same as comparing the sorted list of keys in each dict, 
breaking ties by looking at the list of values, in order of their keys. 
There's a quirk, in that a shorter dict is always less than a longer dict, 
regardless of the elements.

In code:

def dict_cmp_alternative(a, b):
 	diff = cmp(len(a), len(b))
 	if (diff != 0):
 		return diff
 	ka = sorted(a.keys())
 	kb = sorted(b.keys())
 	diff = cmp(ka, kb)
 	if (diff != 0):
 		return diff
 	va = [a[k] for k in ka]
 	vb = [b[k] for k in kb]
 	return cmp(va, vb)

Hope this helps.

tom

PS In case it's of any use to you, here's the code i used to test these:

import random

def rnd(n):
 	return random.randint(0, n)

def randomdict(maxlen=20, range=100):
 	return dict((rnd(range), rnd(range)) for x in xrange(rnd(maxlen)))

def test(cmp2, n=1000):
 	for i in xrange(n):
 		a = randomdict()
 		b = randomdict()
 		if ((cmp(a, b)) != (cmp2(a, b))):
 			raise AssertionError, (a, b)

-- 
What we learn about is not nature itself, but nature exposed to our methods of questioning. -- Werner Heisenberg