groupby() seems slow

Tue Oct 16 02:34:56 EDT 2007

On Oct 15, 11:02 pm, 7stud <bbxx789_0... at yahoo.com> wrote:
> I'm applying groupby() in a very simplistic way to split up some data,
> but when I timeit against another method, it takes twice as long.  The
> following groupby() code groups the data between the "</tr>" strings:
>
> data = [
> "1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
> "1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
> "1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
> ]
>
> import itertools
>
> def key(s):
>     if s[0] == "<":
>         return 'a'
>     else:
>         return 'b'
>
> def test3():
>
>     master_list = []
>     for group_key, group in itertools.groupby(data, key):
>         if group_key == "b":
>             master_list.append(list(group) )
>
> def test1():
>     master_list = []
>     row = []
>
>     for elmt in data:
>         if elmt[0] != "<":
>             row.append(elmt)
>         else:
>             if row:
>                 master_list.append(" ".join(row) )
>                 row = []
>
> import timeit
>
> t = timeit.Timer("test3()", "from __main__ import test3, key, data")
> print t.timeit()
> t = timeit.Timer("test1()", "from __main__ import test1, data")
> print t.timeit()
>
> --output:---
> 42.791079998
> 19.0128788948
>
> I thought groupby() would be faster.  Am I doing something wrong?

Yes and no. Yes, the groupby version can be improved a little by
calling a builtin method instead of a Python function. No, test1 still
beats it hands down (and with Psyco even further); it is almost good
as it gets in pure Python.

FWIW, here's a faster and more compact version with groupby:

def test3b(data):
    join = ' '.join
    return [join(group) for key,group in
            itertools.groupby(data, "</tr>".__eq__)
            if not key]

George