Strategy for determing difference between 2 very large dictionaries

python at bdurham.com python at bdurham.com
Wed Dec 24 03:23:00 EST 2008


Hi Gabriel,

Thank you very much for your feedback!

> k1 = set(dict1.iterkeys())

I noticed you suggested .iterkeys() vs. .keys(). Is there any advantage
to using an iterator vs. a list as the basis for creating a set? I
understand that an iterator makes sense if you're working with a large
set of items one at a time, but if you're creating a non-filtered
collection, I don't see the advantage of using an iterator or a list.
I'm sure I'm missing a subtle point here :)

>> can this last step be done via a simple list comprehension?

> Yes; but isn't a dict comprehension more adequate?
>
> [key: (dict1[key], dict2[key]) for key in common_keys if  
> dict1[key]!=dict2[key]}

Cool!! I'm relatively new to Python and totally missed the ability to
work with dictionary comprehensions. Yes, your dictionary comprehension
technique is much better than the list comprehension approach I was
struggling with. Your dictionary comprehension statement describes
exactly what I wanted to write.

Regards,
Malcolm


----- Original message -----
From: "Gabriel Genellina" <gagsl-py2 at yahoo.com.ar>
To: python-list at python.org
Date: Wed, 24 Dec 2008 05:46:04 -0200
Subject: Re: Strategy for determing difference between 2 very large    
dictionaries

En Wed, 24 Dec 2008 05:16:36 -0200, <python at bdurham.com> escribió:

> I'm looking for suggestions on the best ('Pythonic') way to
> determine the difference between 2 very large dictionaries
> containing simple key/value pairs.
> By difference, I mean a list of keys that are present in the
> first dictionary, but not the second. And vice versa. And a list
> of keys in common between the 2 dictionaries whose values are
> different.
> The 2 strategies I'm considering are:
> 1. Brute force: Iterate through first dictionary's keys and
> determine which keys it has that are missing from the second
> dictionary. If keys match, then verify that the 2 dictionaries
> have identical values for the same key. Repeat this process for
> the second dictionary.
> 2. Use sets: Create sets from each dictionary's list of keys and
> use Python's set methods to generate a list of keys present in
> one dictionary but not the other (for both dictionaries) as well
> as a set of keys the 2 dictionaries have in common.

I cannot think of any advantage of the first approach - so I'd use sets.

k1 = set(dict1.iterkeys())
k2 = set(dict2.iterkeys())
k1 - k2 # keys in dict1 not in dict2
k2 - k1 # keys in dict2 not in dict1
k1 & k2 # keys in both

> Using the set
> of keys in common, compare values across dictionaries to
> determine which keys have different values (can this last step be
> done via a simple list comprehension?)

Yes; but isn't a dict comprehension more adequate?

[key: (dict1[key], dict2[key]) for key in common_keys if  
dict1[key]!=dict2[key]}

(where common_keys=k1&k2 as above)

-- 
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list