itertools.groupby

Raymond Hettinger python at rcn.com
Tue May 29 06:02:33 EDT 2007


On May 28, 8:02 pm, Gordon Airporte <JHoo... at fbi.gov> wrote:
> "Each" seems to imply uniqueness here.

Doh!  This sort of micro-massaging the docs misses the big picture.
If "each" meant unique across the entire input stream, then how the
heck could the function work without reading in the entire data stream
all at once.  An understanding of iterators and itertools philosophy
reveals the correct interpretation.  Without that understanding, it is
a fools errand to try to inject all of the attendant knowledge into
the docs for each individual function.  Without that understanding, a
user would be *much* better off using list based functions (i.e. using
zip() instead izip() so that they will have a thorough understanding
of what their code actually does).

The itertools module necessarily requires an understanding of
iterators.  The module has a clear philosophy and unifying theme.  It
is about consuming data lazily, writing out results in small bits,
keeping as little as possible in memory, and being a set of composable
functional-style tools running at C speed (often making it possible to
avoid the Python eval-loop entirely).

The docs intentionally include an introduction that articulates the
philosophy and unifying theme.  Likewise, there is a reason for the
examples page and the recipes page.  Taken together, those three
sections and the docs on the individual functions guide a programmer
to a clear sense of what the tools are for, when to use them, how to
compose them, their inherent strengths and weaknesses, and a good
intuition about how they work under the hood.

Given that context, it is a trivial matter to explain what groupby()
does:  it is an itertool (with all that implies) that emits groups
from the input stream whenever the key(x) function changes or the
stream ends.

Without the context, someone somewhere will find a way to get confused
no matter how the individual function docs are worded.  When the OP
said that he hadn't read the examples, it is not surprising that he
found a way to get confused about the most complex tool in the
toolset.*

Debating the meaning of "each" is sure sign of ignoring context and
editing with tunnel vision instead of holistic thinking.  Similar
issues arise in the socket, decimal, threading and regular expression
modules.  For users who do not grok those module's unifying concepts,
no massaging of the docs for individual functions can prevent
occasional bouts of confusion.


Raymond


* -- FWIW, the OP then did the RightThing (tm) by experimenting at the
interactive prompt to observe what the function actually does and then
posted on comp.lang.python in a further effort to resolve his
understanding.




More information about the Python-list mailing list