[Python-ideas] itertools.chunks()
Oscar Benjamin
oscar.j.benjamin at gmail.com
Mon Apr 8 13:57:06 CEST 2013
On 8 April 2013 06:31, Wolfgang Maier
<wolfgang.maier at biologie.uni-freiburg.de> wrote:
> Oscar Benjamin <oscar.j.benjamin at ...> writes:
>>
>> On 7 April 2013 10:37, Wolfgang Maier
>> <wolfgang.maier at ...> wrote:
>> >>Also I find myself often writing helper functions like these:
>> >>
>> >>def chunked(sequence,size):
>> >> i = 0
>> >> while True:
>> >> j = i
>> >> i += size
>> >> chunk = sequence[j:i]
>> >> if not chunk:
>> >> return
>> >> yield chunk
>> >
>> > This is just an alternate version of the grouper recipe from the itertools
>> > documentation, just that grouper should be way faster and will also work
>> > with iterators.
>>
>> It's not quite the same as grouper as it doesn't use fill values; I've
>> never found that I wanted fill values in this situation.
>>
>> Also I'm not sure why you think that grouper would be "way faster".
[snip]
>
> I didn't want to imply that slicing was faster/slower than iteration. Rather
> I thought that this particular example would run slower than the grouper
> recipe because of the rest of the python code (assign, increment,
> testing for False every time through the loop). I have not tried to time it,
> but all this should make things slower than grouper, which spends most of
> its time in C. For the special case of ndarrays your argument sounds
> convincing though!
Fair enough. I was making the assumption that the chunk size is large
in which case the time is dominated by creating the slice.
> Regarding the differences between this code and grouper, I am well aware of
> them. It was for that reason that I was mentioning the earlier thread
> *zip_strict() or similar in itertools again, where Peter Otten shows an
> elegant alternative.
Sorry, I didn't read that thread but I have now; I see that you raised
precisely this issue. For what it's worth I agree that the fact a
generator is needed here suggests that there is some kind of primitive
missing from itertools.
Also, here's a version of the same from my own code (modified a
little) that uses islice instead of zip_longest. I haven't timed it
but it was intended to be fast for large chunk sizes and I'd be
interested to know how it compares:
from itertools import islice
def chunked(iterable, size, **kwargs):
'''Breaks an iterable into chunks
Usage:
>>> list(chunked('qwertyuiop', 3))
[['q', 'w', 'e'], ['r', 't', 'y'], ['u', 'i', 'o'], ['p']]
>>> list(chunked('qwertyuiop', 3, fillvalue=None))
[['q', 'w', 'e'], ['r', 't', 'y'], ['u', 'i', 'o'], ['p', None, None]]
>>> list(chunked('qwertyuiop', 3, strict=True))
Traceback (most recent call last):
...
ValueError: Invalid chunk size
'''
list_, islice_ = list, islice
iterator = iter(iterable)
chunk = list_(islice_(iterator, size))
while len(chunk) == size:
yield chunk
chunk = list_(islice_(iterator, size))
if not chunk:
return
elif kwargs.get('strict', False):
raise ValueError('Invalid chunk size')
elif 'fillvalue' in kwargs:
yield chunk + (size - len(chunk)) * [kwargs['fillvalue']]
else:
yield chunk
Oscar
More information about the Python-ideas
mailing list