[Python-ideas] itertools.chunks()

Mon Apr 8 07:31:50 CEST 2013

Oscar Benjamin <oscar.j.benjamin at ...> writes:

> 
> On 7 April 2013 10:37, Wolfgang Maier
> <wolfgang.maier at ...> wrote:
> >>Also I find myself often writing helper functions like these:
> >>
> >>def chunked(sequence,size):
> >>       i = 0
> >>       while True:
> >>               j = i
> >>               i += size
> >>               chunk = sequence[j:i]
> >>               if not chunk:
> >>                       return
> >>               yield chunk
> >
> > This is just an alternate version of the grouper recipe from the itertools
> > documentation, just that grouper should be way faster and will also work
> > with iterators.
> 
> It's not quite the same as grouper as it doesn't use fill values; I've
> never found that I wanted fill values in this situation.
> 
> Also I'm not sure why you think that grouper would be "way faster". If
> sequence is a concrete sequence with efficient random access (e.g. a
> list or tuple) then grouper will just be extracting slices from it. If
> it can do that faster than the sequence.__getslice__ method then
> there's probably something wrong with the implementation of sequence.
> 
> I've written a generator function like the above before and it was
> intended for numpy ndarrays. Since ndarray slices are views into the
> original array, using a slice is a zero copy operation. This means
> that using slices has time complexity of O(number of chunks) rather
> than O(number of elements) for grouper. It also has a constant memory
> requirement rather than O(chunk size) for grouper.
> 
> Oscar
> 

Hi,
I didn't want to imply that slicing was faster/slower than iteration. Rather
I thought that this particular example would run slower than the grouper
recipe because of  the rest of the python code (assign, increment, 
testing for False every time through the loop). I have not tried to time it,
but all this should make things slower than grouper, which spends most of
its time in C. For the special case of ndarrays your argument sounds
convincing though!
Regarding the differences between this code and grouper, I am well aware of
them. It was for that reason that I was mentioning the earlier thread
*zip_strict() or similar in itertools again, where Peter Otten shows an
elegant alternative.
Best,
Wolfgang