Split a string by length

Thu Mar 25 07:42:38 EST 2004

David McNab wrote:

> Now, who's gonna benchmark these n different approaches?

Make that n+1 :-)

<chunks.py>
import itertools

def chunks(s, cl):
    for i in xrange(0, len(s), cl):
        yield s[i:i+cl]

def chunklist(s, cl): # aka divide()
    return [s[i:i+cl] for i in xrange(0, len(s), cl)]

def _chunkiter(item, it, N):
    yield item
    for item in itertools.islice(it, N):
        yield item

def chunksiter(iterable, N):
    """ How would you do that without a helper func?
        Don't hesitate to show me
    """
    N -= 1
    it = iter(iterable)
    for n in it:
        yield _chunkiter(n, it, N)
</chunks.py

$ timeit.py -s"from re import findall" "findall('.{2}', 'aabbccddee')"
100000 loops, best of 3: 9.28 usec per loop
$ timeit.py -s"from chunks import chunks" "list(chunks('aabbccddee', 2))"
100000 loops, best of 3: 7.7 usec per loop
$ timeit.py -s"from chunks import chunklist" "chunklist('aabbccddee', 2)"
100000 loops, best of 3: 7.01 usec per loop

And finally the black sheep:

$ timeit.py -s"from chunks import chunksiter" "[''.join(ch) for ch in
chunksiter('aabbccddee', 2)]"
10000 loops, best of 3: 27.4 usec per loop

To me the greatest difference seems that chunks() and chunklist/divide() can
handle arbitrary sequence types while re.findall() is limited to strings.
My attempt to tackle iterables is both slow and errorprone so far (you need
to exhaust every chunk before you can safely advance to the next).

Peter