ANN: Set-like methods on dicts

Beni Cherniavsky cben at techunix.technion.ac.il
Mon Jun 9 14:14:38 EDT 2003


Dictionary users of the Python world, unite (or intersect)!

Everybody knows that dicts are poor man's sets.  The set classes are
the ruling classes when it comes to high-level operations - their
users bath in convenient operators and methods, while we, users of the
dict class, are oppressed.  Yet society ultimately depends on us - a
set might have the key but only a dict can give value!

I claim that any program heavily using sets one day would need to
associate some value with the keys.  I've recently faced just that in
a program manipulating graphs of sets.  The original naive
implementation was all sets - a set of nodes (which are sets) and a
set of edges (which are sets of 2 nodes).  One day I had to associate
a value (guess of what kind - right, sets!) with every edge.  I also
saw that associating the set of adjucent nodes with each node would be
a good optimisation (and it would make the program more symetric).

So I converted all uses of sets into dicts (even some that could stay
sets, I wanted to feel how it'll like without the `sets` module).
`dict.fromkeys()` was my friend and some set methods were replaced by
uglier code but it was bearable.  Then I started evaluting sets of
changes to the graph.  These were dictionary too because the changes
too record the values associated with the nodes/edges.  Imagine
merging two dictionaries, so that when both have the same key, you
should take the union of the values (they are sets).  Imagine the same
with intersections or (a)symmetris differences.

I've had enough.  Sets should be man's poor dictionaries, not the
other way around!  I wrote a `dsets` module that subclasses `dict` and
backports all set methods to it.  The design extrapolates
`dict.fromkeys` - any iterables are accepted and treated as dicts
whose values are all None.  As long as you only care for the keys, it
should work precisely like `Set`.

The question is what to do about collisions - when two dictionaries
share the same key, which value should we choose?  At first I chose
one of them (the first, since `dict.update()` already uses the second
and it generally seemed more useful).  Since this made the methods
non-commutative, I left out the operator overloading - only the
alphabetic methods were "backported".

The savings from using the new module were surprisingly small.  This
wasn't flexible enough - I needed to merge both values in some way
(e.g. set union).  So the next step was to add a `collision` optional
argument to all relevant methods, specifying a callable of two
arguments.  It's called for every collision with the two values and
return the [combined] value to use.  With this extension, the module
became *extremely* useful.  Finally, I got all the power on I dict I
alsways wanted!

Two convenience functions, `first` and `second`, returning the first
or second argument give easy access to the old behavior.  Unbound set
methods covered the rest of my use cases (not quite - I mixed mutable
and immutable set so I aliased in my application `opearator.__or__` as
`union`, etc.).

When the default argument is None, `self.collision` is called with
the two values instead.  This was designed to allow overriding
`self.collision` in a subclass to avoid specifying the collision
handling in every call.  This turned out useless because different
collision handling is needed for different opeartions (and sometimes
for the same opeartion in different contexts).  I think I'll remove
this.

The default collision handler raises an exception.  I think this is
right.  There is no single most sensible default so whenever you want
any handling you should spell it out.  from this import
explicit_better, errors_should_never_pass, etc.  The only price is
that you can't simply substitute `DSet` for `Set` (assuming you didn't
use the overloaded operators) - even if you never look at the values
it would raise an exception at the first collision, unless you specify
some collision handling.

One thing that I considre adding is the ability to use unbound methods
of DSet class on two `dict`s, not a `DSet` and something.  Another
question is naming.  `dsets.DSet` could just as well be `sdict.SDict`
or anything else.  `collision` is quite descriptive but too long -
perhaps `merge`?

The module is availiable at

http://www.technion.ac.il/~cben/python/dsets.py

Enjoy!  It's documented (could be imporved) and heavily doctested (I
just started using doctest so I'm over-excited ;-).

I think this is useful enough to go into the standard library or
perhaps into `dict` itself.  Comments welcome.

-- 
Beni Cherniavsky <cben at users.sf.net>






More information about the Python-list mailing list