itertools.groupby

Mon May 28 02:34:55 EDT 2007

Raymond Hettinger <python at rcn.com> writes:
> On May 27, 8:28 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> > I use the module all the time now and it is great.  
> Thanks for the accolades and the great example.

Thank YOU for the great module ;).  Feel free to use the example in the
docs if you want.  The question someone coincidentally posted about
finding sequences of capitalized words also made a nice example.

Here's yet another example that came up in something I was working on:
you are indexing a book and you want to print a list of page numbers
for pages that refer to George Washington.  If Washington occurs on
several consecutive pages you want to print those numbers as a 
hyphenated range, e.g.

   Washington, George: 5, 19, 37-45, 82-91, 103

This is easy with groupby (this version not tested but it's pretty close
to what I wrote in the real program).  Again it works by Bates numbering,
but a little more subtly (enumerate generates the Bates numbers):

   snd = operator.itemgetter(1)   # as before

   def page_ranges():
      pages = sorted(filter(contains_washington, all_page_numbers))
      for d,g in groupby(enumerate(pages), lambda (i,p): i-p):
        h = map(snd, g)
        if len(h) > 1:
           yield '%d-%d'% (h[0], h[-1])
        else:
           yield '%d'% h[0]
   print ', '.join(page_ranges())

See what has happened: for a sequence of p's that are consecutive, i-p
stays constant, and groupby splits out the clusters where this occurs.

> FWIW, I checked in a minor update to the docs: ...

The uniq example certainly should be helpful for Unix users.