set, dict and other structures

Mon Jan 31 20:58:15 EST 2005

Raymond Hettinger wrote:

> If set-ifying a dictionary turns out to be an essential and
> time-critical operation, it is possible to add some special case code
> for this.  The question is whether it is worth it.   If you find the
> need is pressing, file a feature request explaining why it is
> essential.  Then, I'll implement it for you.  On the other hand, if
> this is a theoretical nice-to-have, then why clutter up the code
> (with pre-sizing and forcing all values to True).

Just today I was writing some code where I wanted to use sets for the
abstraction (intersection, etc.), but also carry some values with them to
process. So, yes, I believe that having set-like abstraction for dictionaries
would be great. In fact, for a moment I wondered if dict.keys() was already of
type set in 2.4, because it could have made sense.

My concrete situation was a routine that was updating a current snapshot of
data (stored in a dictionary) with a new snapshot of data (another dictionary).
With 'updating' I mean some kind of algorithm where you have to do different
things for:

- items that were present in the current snapshot but not in the new snapshot
anymore (obsolete items)
- items that were not present in the current snapshot but are presente in the
new snapshot (new items)
- items that were present in both the current and the new snapshot (possibly
modified items)

So, this was a perfect suit for sets. Given my dictionaries d1 and d2, I would
have liked to be able to do:

for k,v in (d1 - d2).iteritems():
    ...

for k,v in (d2 - d1).iteritems():
    ...

for k,v in (d1 & d2).iteritems():   # uhm, not fully correct
    ...

Of course, there are some nitpicks. For instance, in the last case the
dictionary at the intersection should probably hold a tuple of two values, one
coming from each dictionary (since the intersection finds the items whose key
is present in both dictionaries, but the value can obviously be different). So
you'd probably need something:

for k,(v1,v2) in (d1 & d2).iteritems():
    ...

Another solution would be to say that set-like operations, like (d1 & d2),
return an iterator to a sequence of keys *only*, in this case the sequence of
keys available in both dictionaries.

I also tried a different approach, that is putting my data in some structure
that was suitable for a set (with __hash__ and __key__ methods that were caring
of the key part only):

class FakeForSet:
    def __init__(self, k, v):
        self.key = key
        self.value = value
    def __hash__(self):
         return hash(self.key)
    def __cmp__(self, other):
         return cmp(self.key, other.key)

but this looked immediatly weird because I was lying to Python somehow. It then
became clear that it's not going to work, because when you go through (s1 & s2)
you do not know if the items you get are coming from s1 or s2, and that is
important because the value member is going to be different (even if you're
lying to Python saying that those items are equal anyway).

I know of course there are other ways of doing the same thing, like:

   # intersection
   for k in d1:
       if k not in d2:
           continue
       v1, v2 = d1[k], d2[k]
       ...

But I don't think there is anything wrong in providing an abstraction for this,
especially since we already decided that set-like abstractions are useful.

So, FWIW, I would find set-like operations on dictionaries an useful addition
to Python. Besides, I guess they can be easily experimented and studied by
subclassing dict.
-- 
Giovanni Bajo