itertools.groupby

Raymond Hettinger python at rcn.com
Tue May 29 02:34:36 EDT 2007


On May 28, 8:36 pm, "Carsten Haese" <cars... at uniqsys.com> wrote:
> And while
> we're at it, it probably should be keyfunc(value), not key(value).

No dice.  The itertools.groupby() function is typically used
in conjunction with sorted().  It would be a mistake to call
it keyfunc in one place and not in the other.  The mental
association is essential.  The key= nomenclature is used
throughout Python -- see min(), max(), sorted(), list.sort(),
itertools.groupby(), heapq.nsmallest(), and heapq.nlargest().

Really.  People need to stop making-up random edits to the docs.
For the most part, the current wording is there for a reason.
The poster who wanted to rename the function to telescope() did
not participate in the extensive python-dev discussions on the
subject, did not consider the implications of unnecessarily
breaking code between versions, did not consider that the term
telescope() would mean A LOT of different things to different
people, did not consider the useful mental associations with SQL, etc.

I recognize that the naming of things and the wording
of documentation is something *everyone* has an opinion
about.  Even on python-dev, it is typical that posts with
technical analysis or use case studies are far outnumbered
by posts from folks with strong opinions about how to
name things.

I also realize that you could write a book on the subject
of this particular itertool and someone somewhere would still
find it confusing.  In response to this thread, I've put in
additional documentation (described in an earlier post).
I think it is time to call this one solved and move on.
It currently has a paragraph plain English description,
a pure python equivalent, an example, advice on when to
list-out the iterator, triply repeated advice to pre-sort
using the same key function, an alternate description as
a tool that groups whenever key(x) changes, a comparison to
UNIX's uniq filter, a contrast against SQL's GROUP BY clauses,
and two worked-out examples on the next page which show
sample inputs and outputs. It is now one of the most
throughly documented individual functions in the language.
If someone reads all that, runs a couple of experiments
at the interactive prompt, and still doesn't get it,
then god help them when they get to the threading module
or to regular expressions.

If the posters on this thread have developed an interest
in the subject, I would find it useful to hear their
ideas on new and creative ways to use groupby().  The
analogy to UNIX's uniq filter was found only after the
design was complete.  Likewise, the page numbering trick
(shown above by Paul and in the examples in the docs)
was found afterwards.  I have a sense that there are entire
classes of undiscovered use cases which would emerge
if serious creative effort where focused on new and
interesting key= functions (the page numbering trick
ought to serve as inspiration in this regard).

The gauntlet has been thrown down.  Any creative thinkers
up to the challenge?  Give me cool recipes.


Raymond




More information about the Python-list mailing list