[Python-Dev] [Python-ideas] itertools.chunks(iterable, size, fill=None)

Terry Reedy tjreedy at udel.edu
Wed Jul 4 20:31:19 CEST 2012


On 7/4/2012 5:57 AM, anatoly techtonik wrote:
> On Fri, Jun 29, 2012 at 11:32 PM, Georg Brandl <g.brandl at gmx.net> wrote:

>> Anatoly, so far there were no negative votes -- would you care to go
>> another step and propose a patch?
>
> Was about to say "no problem",

Did you read that there *are* strong negative votes? And that this idea 
has been rejected before? I summarized the objections in my two 
responses and pointed to the tracker issues. One of the objections is 
that there are 4 different things one might want if the sequence length 
is not an even multiple of the chunk size. Your original 'idea' did not 
specify.

> For now the best thing I can do (I don't risk even to mention anything
> with 3.3) is to copy/paste code from the docs here:
>
> from itertools import izip_longest
> def chunks(iterable, size, fill=None):
>      """Split an iterable into blocks of fixed-length"""
>      # chunks('ABCDEFG', 3, 'x') --> ABC DEF Gxx
>      args = [iter(iterable)] * size
>      return izip_longest(fillvalue=fill, *args)

Python ideas is about Python 3 ideas. Please post Python 3 code.

This is actually a one liner

     return zip_longest(*[iter(iterable)]*size, fillvalue=file)

We don't generally add such to the stdlib.

> BTW, this doesn't work as expected (at least for strings). Expected is:
>    chunks('ABCDEFG', 3, 'x') --> 'ABC' 'DEF' 'Gxx'
> got:
>    chunks('ABCDEFG', 3, 'x') --> ('A' 'B' 'C') ('D' 'E' 'F') ('G' 'x' 'x')

One of the problems with idea of 'add a chunker' is that there are at 
least a dozen variants that different people want. I discussed the 
problem of return types issue in my responses. I showed how to get the 
'expected' response above using grouper, but also suggested that it is 
the wrong basis for splitting strings. Repeated slicing make more sense 
for concrete sequence types.

def seqchunk_odd(s, size):
     # include odd size left over
     for i in range(0, len(s), size):
         yield s[i:i+size]

print(list(seqchunk_odd('ABCDEFG', 3)))
#
['ABC', 'DEF', 'G']

def seqchunk_even(s, size):
     # only include even chunks
     for i in range(0, size*(len(s)//size), size):
         yield s[i:i+size]

print(list(seqchunk_even('ABCDEFG', 3)))
#
['ABC', 'DEF']

def strchunk_fill(s, size, fill):
     # fill odd chunks
     q, r = divmod(len(s), size)
     even = size * q
     for i in range(0, even, size):
         yield s[i:i+size]
     if size != even:
         yield s[even:] + fill * (size - r)

print(list(strchunk_fill('ABCDEFG', 3, 'x')))
#
['ABC', 'DEF', 'Gxx']

Because the 'fill' value is necessarily a sequence for strings, 
strchunk_fill would only work for lists and tuples if the fill value 
were either required to be given as a tuple or list of length 1 or if it 
were internally converted inside the function. Skipping that for now.

Having written the fill version based on the even version, it is easy to 
select among the three behaviors by modifying the fill version.

def strchunk(s, size, fill=NotImplemented):
     # fill odd chunks
     q, r = divmod(len(s), size)
     even = size * q
     for i in range(0, even, size):
         yield s[i:i+size]
     if size != even and fill is not NotImplemented:
         yield s[even:] + fill * (size - r)

print(*strchunk('ABCDEFG', 3))
print(*strchunk('ABCDEFG', 3, ''))
print(*strchunk('ABCDEFG', 3, 'x'))
#
ABC DEF
ABC DEF G
ABC DEF Gxx

I already described how something similar could be done by checking each 
grouper output tuple for a fill value, but that requires that the fill 
value be a sentinal that could not otherwise appear in the tuple. One 
could modify grouper to fill with a private object() and check the last 
item of each group for that sentinal and act accordingly (delete, 
truncate, or replace). A generic api needs some thought, though.

---
An issue I did not previously mention is that people sometimes want 
overlapping chunks rather than contiguous disjoint chunks. The slice 
approach trivially adapts to that.

def seqlap(s, size):
     for i in range(len(s)-size+1):
         yield s[i:i+size]

print(*seqlap('ABCDEFG', 3))
#
ABC BCD CDE DEF EFG

A sliding window for a generic iterable requires a deque or ring buffer 
approach that is quite different from the zip-longest -- grouper approach.

-- 
Terry Jan Reedy





More information about the Python-Dev mailing list