What's the cleanest way to compare 2 dictionary?

Thu Aug 10 18:06:47 EDT 2006

John Machin wrote:
> John Henry wrote:
> > Hi list,
> >
> > I am sure there are many ways of doing comparision but I like to see
> > what you would do if you have 2 dictionary sets (containing lots of
> > data - like 20000 keys and each key contains a dozen or so of records)
> > and you want to build a list of differences about these two sets.
> >
> > I like to end up with 3 lists: what's in A and not in B, what's in B
> > and not in A, and of course, what's in both A and B.
> >
> > What do you think is the cleanest way to do it?  (I am sure you will
> > come up with ways that astonishes me  :=) )
> >
>
> Paddy has already pointed out a necessary addition to your requirement
> definition: common keys with different values.
>
> Here's another possible addition: you say that "each key contains a
> dozen or so of records". I presume that you mean like this:
>
> a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" -> 2 to
> save typing :-)
>
> Now that happens if the other dictionary contains:
>
> b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']}
>
> Key 42 would be marked as different by Paddy's classification, but the
> values are the same, just not in the same order. How do you want to
> treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and
> are you sure the buckets don't contain duplicates? Maybe you need
> set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'?
>
> All comparisons are equal, but some comparisons are more equal than
> others :-)
>
> Cheers,
> John

Hi Johns,
The following is my attempt to give more/deeper comparison info.
Assume you have your data parsed and presented as two dicts a and b
each having as values a dict representing a record.
Further assume you have a function that can compute if two record level
dicts  are the same and another function that can compute if two values
in a record level dict are the same.

With a slight modification of my earlier prog we get:

def komparator(a,b, check_equal):
    keya=set(a.keys())
    keyb=set(b.keys())
    a_xclusive = keya - keyb
    b_xclusive = keyb - keya
    _common = keya & keyb
    common_eq = set(k for k in _common if check_equal(a[k],b[k]))
    common_neq = _common - common_eq
    return (a_xclusive, b_xclusive, common_eq, common_neq)

a_xclusive, b_xclusive, common_eq, common_neq = komparator(a,b,
record_dict__equality_checker)

common_neq = [ (key,
  komparator(a[key],b[key], value__equality_checker)  )
  for key in common_neq ]

Now we get extra info on intra record differences with little extra
code.

Look out though, you could get swamped with data :-)

- Paddy.