itertools.groupby

sjdevnull at yahoo.com sjdevnull at yahoo.com
Tue Jun 5 13:17:30 EDT 2007


tutufan at gmail.com wrote:
> On May 27, 7:50 pm, Raymond Hettinger <pyt... at rcn.com> wrote:
> > The groupby itertool came-out in Py2.4 and has had remarkable
> > success (people seem to get what it does and like using it, and
> > there have been no bug reports or reports of usability problems).
>
> With due respect, I disagree.  Bug ID #1212077 is either a bug report
> or a report of a usability problem, depending on your point of view.
> You may disagree on whether or not this is a problem that needs to be
> be fixed, but it *is* a report.
>
> http://sourceforge.net/tracker/index.php?func=detail&aid=1212077&group_id=5470&atid=105470
>
>
> I think the semantics of the itertools groupby are too tricky for
> naive users

Itertools isn't targeted primarily at naive users.  It can be useful
to them, but it's really there to allow sophisticated work on
iterables without reading them all in at once (indeed, it works
properly on infinite iterables).  That's pretty much _the_ defining
characteristic of itertools

Anyone who's doing that knows you can't do infinite lookahead, so you
can't do a sort or a group-by over the entire data set.  IOW, for
anyone who would be looking to use itertools for what it's designed
for, the kind of operation you specify below would be very unexpected.

>--I find them confusing myself, and I've been using Python
> for quite a while.  I still hope that Python will someday gain a
> groupby function suitable for ordinary use.  Until that happens, I
> recommend the following cookbook entry:
>
> # from http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/259173
>
> class groupby(dict):
>     def __init__(self, seq, key=lambda x:x):
>         for value in seq:
>             k = key(value)
>             self.setdefault(k, []).append(value)
>     __iter__ = dict.iteritems

The itertools groupby is incredibly useful for writing SQL object
mappers.  It's exactly what I wanted when I first started looking in
itertools to see if there was a way to consolidate rows.

Also, that recipe goes against the spirit of itertools--if I'm going
out of my way to use itertools, it usually means I may be working with
very large data sets that I can't read into memory.  It's a useful
recipe, but it's also likely to be unusable in the context of
itertools-related problem domains.




More information about the Python-list mailing list