Split a string by length
Peter Otten
__peter__ at web.de
Thu Mar 25 07:42:38 EST 2004
David McNab wrote:
> Now, who's gonna benchmark these n different approaches?
Make that n+1 :-)
<chunks.py>
import itertools
def chunks(s, cl):
for i in xrange(0, len(s), cl):
yield s[i:i+cl]
def chunklist(s, cl): # aka divide()
return [s[i:i+cl] for i in xrange(0, len(s), cl)]
def _chunkiter(item, it, N):
yield item
for item in itertools.islice(it, N):
yield item
def chunksiter(iterable, N):
""" How would you do that without a helper func?
Don't hesitate to show me
"""
N -= 1
it = iter(iterable)
for n in it:
yield _chunkiter(n, it, N)
</chunks.py
$ timeit.py -s"from re import findall" "findall('.{2}', 'aabbccddee')"
100000 loops, best of 3: 9.28 usec per loop
$ timeit.py -s"from chunks import chunks" "list(chunks('aabbccddee', 2))"
100000 loops, best of 3: 7.7 usec per loop
$ timeit.py -s"from chunks import chunklist" "chunklist('aabbccddee', 2)"
100000 loops, best of 3: 7.01 usec per loop
And finally the black sheep:
$ timeit.py -s"from chunks import chunksiter" "[''.join(ch) for ch in
chunksiter('aabbccddee', 2)]"
10000 loops, best of 3: 27.4 usec per loop
To me the greatest difference seems that chunks() and chunklist/divide() can
handle arbitrary sequence types while re.findall() is limited to strings.
My attempt to tackle iterables is both slow and errorprone so far (you need
to exhaust every chunk before you can safely advance to the next).
Peter
More information about the Python-list
mailing list