Split a string by length

Peter Otten __peter__ at web.de
Thu Mar 25 09:00:26 EST 2004


Yermat wrote:

> take care with that kind of comparisons...

I'll add a disclaimer next time :-)

> Especially look at the last two comparison ! the only difference is the
> construction of the list...
> 
> so what ? "Beautiful is better than ugly"
> Make your choice ;-)

I should have made explicit my "result" that timing doesn't matter much (I
think) in this case.

> timeit.py -s"import re" "f = re.compile('.{2}')" "f.findall('aabbccddee')"
> 100000 loops, best of 3: 7.67 usec per loop

For a fair comparison, the 2 would have to be an argument rather than a
constant. You could then argue that re.compile() has to be timed, too.

> timeit.py -s"from chunk import chunks" "list(chunks('aabbccddee', 2))"
> 100000 loops, best of 3: 7.3 usec per loop
> 
> timeit.py -s"from chunk import chunklist" "chunklist('aabbccddee', 2)"
> 100000 loops, best of 3: 6.3 usec per loop

The difference between chunks() and chunklist() is that you do not need to
exaust the chunks() generator. Of course that advantage vanishes in my
simplistic benchmark...

> timeit.py -s"from chunk import chunksiter"
> "list(chunksiter('aabbccddee', 2))"
> 100000 loops, best of 3: 11.8 usec per loop
> 
> timeit.py -s"from chunk import chunksiter" "[ x for x in
> chunksiter('aabbccddee', 2)]"
> 100000 loops, best of 3: 15.2 usec per loop

These last two timings are useless. The following (and two nested loops
without continue in either loop or break in the inner loop) is the only
sane use case:

>>> [list(ch) for ch in chunks.chunksiter("aabbccdd", 2)]
[['a', 'a'], ['b', 'b'], ['c', 'c'], ['d', 'd']]
>>>

Looks reasonable. Now let's separate the above in two steps:

>>> lst = [ch for ch in chunks.chunksiter("aabbccdd", 2)]
>>> map(list, lst)
[['a'], ['a'], ['b'], ['b'], ['c'], ['c'], ['d'], ['d']]
>>>

Did you expect that? Errorprone indeed. Of course you could use tee() from
the itertools example page to make it more robust, but I would expect even
more overhead...

Peter





More information about the Python-list mailing list