itertools.groupby

Mon May 28 11:34:11 EDT 2007

On May 27, 6:50 pm, Raymond Hettinger <pyt... at rcn.com> wrote:
> On May 27, 2:59 pm, Steve Howell <showel... at yahoo.com> wrote:
>
> > These docs need work.  Please do not defend them;
> > please suggest improvements.
>
> FWIW, I wrote those docs.  Suggested improvements are
> welcome; however, I think they already meet a somewhat
> high standard of quality:
>
> - there is an accurate, succinct one-paragraph description
>   of what the itertool does.
>
> - there is advice to pre-sort the data using the same
>   key function.
>
> - it also advises when to list-out the group iterator
>   and gives an example with working code.
>
> - it includes a pure python equivalent which shows precisely
>   how the function operates.
>
> - there are two more examples on the next page.  those two
>   examples also give sample inputs and outputs.
>
> This is most complex itertool in the module.  I believe
> the docs for it can be usable, complete, and precise,
> but not idiot-proof.
>
> The groupby itertool came-out in Py2.4 and has had remarkable
> success (people seem to get what it does and like using it, and
> there have been no bug reports or reports of usability problems).
> All in all, that ain't bad (for what 7stud calls a poster child).
>
> Raymond

>- there is an accurate, succinct one-paragraph description
>  of what the itertool does.

As is often the case, the specifics of the description may only be
meaningful to someone who already knows what groupby does.  There are
many terms and concepts that experienced programmers use to describe
programming problems, but often the terms and concepts only ring true
with people who already understand the problem, and they are not at
all helpful for someone who is trying to learn about the concept.

Sometimes when you describe a function accurately, the description
becomes almost impossible to read because of all the detail.  What is
really needed is a general, simple description of the primary use of
the function, so that a reader can immediately employ the function in
a basic way.  Code snippets are extremely helpful in that regard.
Subsequently, the details and edge cases can be fleshed out in the
rest of the description.

>- there is advice to pre-sort the data using the same
>  key function.

But you have to know why that is relevant in the first place--
otherwise it is just a confusing detail.  Two short code examples
could flesh out the relevance of that comment.  I think I now
understand why pre-sorting is necessary: groupby only groups similar
items that are adjacent to each other in the sequence, and similar
items that are elsewhere in the sequence will be in a different group.

>- it includes a pure python equivalent which shows precisely
>  how the function operates.

It's too complicated.  Once again, it's probably most useful to an
experienced programmer who is trying to figure out some edge case. So
the code example is certainly valuable to one group of readers--just
not a reader who is trying to get a basic idea of what groupby does.

>- there are two more examples on the next page.  those two
>  examples also give sample inputs and outputs.

I didn't see those.

> people seem to get what it does and like
> using it, and
> there have been no bug reports or reports of
> usability problems

Wouldn't you get the same results if not many people used groupby
because they couldn't understand what it does?

I don't think you even need good docs if you allow users to attach
comments to the docs because all the issues will get fleshed out by
the users.  I appreciate the fact that it must be difficult to write
the docs--that's why I think user comments can help.

How about this for the beginning of the description of groupby in the
docs:

groupby divides a sequence into groups of similar elements.

Compare to:

> Make an iterator that returns consecutive keys and groups
> from the iterable.

Huh?

Continuing with a kinder, gentler description:

With a starting sequence like this:

lst = [1, 2, 2, 2, 1, 1, 3]

groupby divides the sequence into groups like this:

[1], [2, 2, 2], [1, 1], [3]

groupby takes similar elements that are adjacent to each other and
gathers them into a group.  If you sort the sequence beforehand, then
all the similar elements in a sequence will be adjacent to one
another, and therefore they will all end up in one group.

Optionally, you can specify a function func which groupby will use to
determine which elements in the sequence are similar (if func isn't
specified or is None, then func defaults to the identity function
which returns the element unchanged). An example:

------
import itertools

lst = [1, 2, 2, 2, 1, 1, 3]

def func(num):
    if num == 2:
        return "a"
    else:
        return "b"

keys = []
groups = []
for k, g in itertools.groupby(lst, func):
    keys.append(k)
    groups.append( list(g) )

print keys
print groups

---output:---
['b', 'a', 'b']
[[1], [2, 2, 2], [1, 1, 3]]

When func is applied to an element in the list, and the return
value(or key) is equal to "a", the element is considered similar to
other elements with a return value(or key) equal to "a".  As a result,
the adjacent elements that all have a key equal to "a" are put in a
group; likewise the adjacent elements that all have a key equal to "b"
are put in a group.

RETURN VALUE: groupby returns a tuple consisting of:
1) the key for the current group; all the elements of a group have the
same key

2) an iterator for the current group, which you normally use list(g)
on to get the current group as a list.
-----------------

That description probably contains some inaccuracies, but sometimes a
less accurate description can be more useful.