Doing set operation on non-hashable objects

Carl Banks pavlovevidence at gmail.com
Wed Dec 24 15:52:02 EST 2008


On Dec 24, 1:16 pm, 5lvqbw... at sneakemail.com wrote:
> Hi,
>
> I'm writing an application which is structured roughly as follows:
>
> "db" is a dict, where the values are also dicts.
> A function searches through db and returns a list of values, each of
> which is a dict as described above.
> I need to perform set operations on these lists (intersection and
> union)
> However the objects themselves are not hashable, and therefore can't
> be in a set, because they are dicts.
> I'm not sure how large each set will be, but the point is I'm trying
> to avoid anything looking like an O(n^2) algorithm, so I can't just
> use naive double-looping to check for intersection/union on a pair of
> lists.
>
> The only way I can think of to do this right is to hash the dicts by
> freezing them, turning them all into tuples, which can then be
> hashed.  But this is a problem because more dicts might be nested
> inside.  At any rate, I'd need to thaw them out when I'm done and turn
> them back into dicts, which seems complicated and annoying, and all
> this leads me to believe there is a better way.
>
> What I really need is a set of pointers, so at the end of the
> operation I have the actual objects pointed to.  Can I somehow use the
> object IDs as set elements, then recreate a list with the actual
> objects they point to?
>
> How do you get the object back from an ID in python?

Quick and dirty way is to reference the dicts with a lightweight
hashable object that uses the dict's identity.  For instance:


class Identity(object):
    def __init__(self,obj):
        self.obj = obj
    def __hash__(self):
        return hash(id(self.obj))
    def __eq__(self,other):
        return self.obj is other.obj


Then, instead of "s.add(d)" use "s.add(Identity(d))".

Instead of "d = s.pop()" use "d = s.pop().obj".

Instead of "d in s" use "Identity(d) in s".

And so on.


To do it a bit better, you could create an IdentitySet class that
subclasses set and wraps the methods so that they automatically apply
wrap and unwrap the arguments on their way in and out (I'd bet there's
already a cookbook recipe to do that).


Carl Banks



More information about the Python-list mailing list