use cases for a defaultdict

Wed Jan 18 12:43:10 EST 2006

Steven Bethard wrote:
 > Agreed.  I really hope that Python 3.0 applies Raymond Hettinger's
 > suggestion "Improved default value logic for Dictionaries" from
 >      http://wiki.python.org/moin/Python3%2e0Suggestions
 >
 > This would allow you to make the setdefault() call only once, instead
 > of on every lookup:
 >
 >      class meh(dict):
 >          def __init__(self, *args, **kwargs):
 >              super(meh, self).__init__(*args, **kwargs)
 >              self.setdefault(function=meh)

Steve Holden wrote:
 > In fact, why not go one better and also add a "default" keyword
 > parameter to dict()?

Steven Bethard wrote:
 > It's not backwards compatible:
 >
 >  >>> dict(default=4)
 > {'default': 4}

Steve Holden wrote:
 > Nyargle. Thanks, you're quite right, of course: I was focussing on the
 > list-of-pairs argument style when I wrote that. So the best we could
 > do is provide a subtype, defaultdict(default, *args, *kw).

Steven D'Aprano wrote:
 > I vote to leave dict just as it is, and add a subclass, either in a
 > module or as a built in (I'm not fussed either way) for
 > dicts-with-defaults.

Yeah, I like that idea too.  That's a lot of Steven's agreeing on this 
-- do we realy need agreement from people with other names too? ;)

I do think defaultdict() is probably the right way to go, though I'm not 
certain about the signature -- it needs to support value-based defaults 
(e.g. 0) and function based defaults (e.g. a new list each time). 
Perhaps we need to introduce two new collections objects, 
defaultvaluedict and defaultfunctiondict:

 >>> class defaultdict(dict):
...     def __init__(self, default, *args, **kwargs):
...         super(defaultdict, self).__init__(*args, **kwargs)
...         self._default = default
...     def __repr__(self):
...         type_name = type(self).__name__
...         super_str = super(defaultdict, self).__repr__()
...         return '%s(%r, %s)' % (type_name, self._default, super_str)
...
 >>> class defaultvaluedict(defaultdict):
...     def __getitem__(self, key):
...         if key not in self:
...             self[key] = self._default
...         return super(defaultvaluedict, self).__getitem__(key)
...
 >>> class defaultfunctiondict(defaultdict):
...     def __getitem__(self, key):
...         if key not in self:
...             self[key] = self._default()
...         return super(defaultfunctiondict, self).__getitem__(key)
...
 >>> counts = defaultvaluedict(0)
 >>> counts['Steve'] += 1
 >>> counts['Steve'] += 1
 >>> counts['Steven'] += 1
 >>> counts
defaultvaluedict(0, {'Steve': 2, 'Steven': 1})
 >>> groups = defaultfunctiondict(list)
 >>> groups['Steve'].append('Holden')
 >>> groups['Steve'].append('Bethard')
 >>> groups['Steven'].append("D'Aprano")
 >>> groups
defaultfunctiondict(<type 'list'>, {'Steve': ['Holden', 'Bethard'], 
'Steven': ["D'Aprano"]})

I didn't override of get() or setdefault(), which means they don't use 
the default associated with the dict.  I think this is right because the 
only time you'd use get() or setdefault() with a defaultdict is if you 
wanted to override the default normally associated with it.

The question, I guess, is whether or not there are enough use cases to 
merit introducing these types into the collections module.  Some of my 
use cases:

* Counting numbers of items, using defaultvaluedict(0).  Currently, I 
support this by having a counts() function in my dicttools module:

     def counts(iterable, key=None):
         result = {}
         for item in iterable:
             # apply key function if necessary
             if key is None:
                 k = item
             else:
                 k = key(item)
             # increment key's count
             try:
                 result[k] += 1
             except KeyError:
                 result[k] = 1
         return result

Raymond Hettinger has also proposed a bag_ recipe for similar purposes.

.. _bag: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/259174

* Grouping items into lists, using defaultfunctiondict(list). 
Currently, I support this by having a groupby() function in my dicttools 
module:

     def groupby(iterable, key=None, value=None):
         result = {}
         for item in iterable:
             # apply key function if necessary
             if key is None:
                 k = item
             else:
                 k = key(item)
             # apply value function if necessary
             if value is None:
                 v = item
             else:
                 v = value(item)
             # append value to key's list
             try:
                 result[k].append(v)
             except KeyError:
                 result[k] = [v]
         return result

Note that for both of my use cases, the appropriate defaultdict() could 
take the try/except (equivalent to a setdefault() call) out of my code. 
  It's nice, but not a huge gain -- I definitely can't drop my code 
completely...

Do others have compelling use-cases for a defaultdict-like class?

STeVe