[Python-Dev] tally (and other accumulators)

Jess Austin jaustin at post.harvard.edu
Wed Apr 5 08:47:07 CEST 2006


Alex wrote:
> On Apr 4, 2006, at 10:53 PM, Jess Austin wrote:
> > Alex wrote:
> >> import collections
> >> def tally(seq):
> >>      d = collections.defaultdict(int)
> >>      for item in seq:
> >>          d[item] += 1
> >>      return dict(d)
> >
> > I'll stop lurking and submit the following:
> >
> > def tally(seq):
> >     return dict((group[0], len(tuple(group[1])))
> >                 for group in itertools.groupby(sorted(seq)))
> >
> > In general itertools.groupby() seems like a very clean way to do this
> > sort of thing, whether you want to end up with a dict or not.  I'll go
> > so far as to suggest that the existence of groupby() obviates the
> > proposed tally().  Maybe I've just coded too much SQL and it has  
> > warped my brain...
> >
> > OTOH the latter definition of tally won't cope with iterables, and it
> > seems like O(nlogn) rather than O(n).
> 
> It will cope with any iterable just fine (not sure why you think  
> otherwise), but the major big-O impact seems to me to rule it out as  
> a general solution.

You're right in that it won't raise an exception on an iterator, but the
sorted() means that it's much less memory efficient than your version
for iterators.  Another reason to avoid sorted() for this application,
besides the big-O.  Anyway, I still like groupby() for this sort of
thing, with the aforementioned caveats.  Functional code seems a little
clearer to me, although I realize that preference is not held
universally.

cheers,
Jess


More information about the Python-Dev mailing list