groupby() seems slow

Raymond Hettinger python at rcn.com
Tue Oct 16 16:04:41 EDT 2007


On Oct 15, 8:02 pm, 7stud <bbxx789_0... at yahoo.com> wrote:
> t = timeit.Timer("test3()", "from __main__ import test3, key, data")
> print t.timeit()
> t = timeit.Timer("test1()", "from __main__ import test1, data")
> print t.timeit()
>
> --output:---
> 42.791079998
> 19.0128788948
>
> I thought groupby() would be faster.  Am I doing something wrong?

The groupby() function is not where you are losing speed.  In test1,
you've in-lined the code for computing the key.  In test3, groupby()
makes expensive, repeated calls to a pure python key function.   For
an apples-to-apples comparison, try something like this:

def test4():
    master_list = []
    row = []
    for elem in data:
        if key(elem) == 'a':
            row.append(elem)
        elif row:
            master_list.append(' '.join(row))
            del row[:]


Raymond





More information about the Python-list mailing list