itertools.groupby

Paul Rubin http
Mon May 28 18:49:35 EDT 2007


Raymond Hettinger <python at rcn.com> writes:
> I think the OP would have been better-off with plain
> vanilla Python such as:
> 
>    See http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/259173

But that recipe generates the groups in a random order depending on
the dict hashing, instead of keeping them in the original sequence's
order, which the OP's application might well require.
itertools.groupby really is the right thing.  I agree that itertools
is not the easiest module in the world for beginning programmers to
understand, but every serious Python user should spend some time
figuring it out sooner or later.  Iterators and itertools really turn
Python into a higher-level language than it was before, giving
powerful and streamlined general-purpose mechanisms that replace a lot
of special-purpose hand-coding that usually ends up being a lot more
work to debug in addition to bloating the user's code.  Itertools
should by no means be thought of as just a performance hack.  It makes
programs smaller and sharper.  It quickly becomes the One Obvious Way
To Do It.

In my past few kloc of Python, I think I've written just one or two
"class" statements.  I used to use class instances all the time, to
maintain little bits of state that had to be held between different
operations in a program.  Using itertools means I now tend to organize
entire programs as iterator pipelines so that all the data runs
"through the goose" exactly once and there is almost no need to
maintain any state anywhere outside the scope of simple function
invocations.  There are just fewer user-written moving parts when a
program is written that way, and therefore fewer ways for the program
to go wrong.  Messy edge cases that used to take a lot of thought to
handle correctly now need no attention at all--they just handle
themselves.

Also I think it's generally better to use a documented standard
library routine than a purpose-written routine or even a downloaded
recipe, since the stdlib routine will stay easily available from one
project to another and as the user gains experience with it, it will
become more and more powerful in his or her hands.  

Also, these days I think I'd write that recipe with a defaultdict
instead of setdefault, but that's new with Python 2.5.



More information about the Python-list mailing list