[Python-ideas] itertools.chunks(iterable, size, fill=None)

Terry Reedy tjreedy at udel.edu
Sat Jun 30 23:09:36 CEST 2012


On 6/29/2012 4:32 PM, Georg Brandl wrote:
> On 26.06.2012 10:03, anatoly techtonik wrote:
>> Now that Python 3 is all about iterators (which is a user killer
>> feature for Python according to StackOverflow -
>> http://stackoverflow.com/questions/tagged/python) would it be nice to
>> introduce more first class functions to work with them? One function
>> to be exact to split string into chunks.

Nothing special about strings.

>>      itertools.chunks(iterable, size, fill=None)

This is a renaming of itertools.grouper in 9.1.2. Itertools Recipes. You 
should have mentioned this. I think of 'blocks' rather than 'chunks', 
but I notice several SO questions with 'chunk(s)' in the title.

>> Which is the 33th most voted Python question on SO -
>> http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python/312464

I am curious how you get that number. I do note that there are about 15 
other Python SO questions that seem to be variations on the theme. There 
might be more if 'blocks' and 'groups' were searched for.

> Anatoly, so far there were no negative votes -- would you care to go
> another step and propose a patch?

That is because Raymond H. is not reading either list right now ;-)
Hence the Cc:. Also because I did not yet respond to a vague, very 
incomplete idea.

 From Raymond's first message on http://bugs.python.org/issue6021 , add 
grouper:

"This has been rejected before.

* It is not a fundamental itertool primitive.  The recipes section in
the docs shows a clean, fast implementation derived from zip_longest().

* There is some debate on a correct API for odd lengths.  Some people
want an exception, some want fill-in values, some want truncation, and
some want a partially filled-in tuple.  The alone is reason enough not
to set one behavior in stone.

* There is an issue with having too many itertools.  The module taken as
a whole becomes more difficult to use as new tools are added."

---
This is not to say that the question should not be re-considered. Given 
the StackOverflow experience in addition to that of the tracker and 
python-list (and maybe python-ideas), a special exception might be made 
in relation to points 1 and 3.

---
It regard to point 2: many 'proposals', including Anatoly's, neglect 
this detail. But the function has to do *something* when seqlen % 
grouplen != 0. So an 'idea' is not really a concrete programmable 
proposal until 'something' is specified.

Exception -- not possible for an itertool until the end of the iteration 
(see below). To raise immediately for sequences, one could wrap grouper.

def exactgrouper(sequence, k):  # untested
   if len(sequence) % k:
     raise ValueError('Sequence length {} must be a multiple of group 
length {}'.format(len(sequence), k)
   else:
     return itertools.grouper(sequence, k)

Of course, sequences can also be directly sequentially sliced (but 
should the result be an iterable or sequence of blocks?). But we do not 
have a seqtools module and I do not think there should be another method 
added to the seq protocol.

Fill -- grouper always does this, with a default of None.

Truncate, Remainder -- grouper (zip_longest) cannot directly do this and 
no recipes are given in the itertools docs. (More could be, see below.)

Discussions on python-list gives various implementations either for 
sequences or iterables. For the latter, one approach is "it = 
iter(iterable)" followed by repeated islice of the first n items. 
Another is to use a sentinal for the 'fill' to detect a final incomplete 
block (tuple for grouper).

def grouper_x(n, iterable):  # untested
   sentinal = object()
   for g in grouper(n, iterable, sentinal):
     if g[-1] != sentinal:
       yield g
     else:
       # pass to truncate
       # yield g[:g.index(sentinal) for remainer
       # raise ValueError for delayed exception

---
The above discussion of point 2 touches on point 4, which Raymond 
neglected in the particular message above but which has come up before: 
What are the allowed input and output types? An idea is not a 
programmable proposal until the domain, range, and mapping are specified.

Possible inputs are a specific sequence (string, for instance), any 
sequence, any iterable. Possible outputs are a sequence or iterator of 
sequence or iterator. The various python-list and stackoverflow posts 
questions asks for various combinations. zip_longest and hence grouper 
takes any iterable and returns an iterator of tuples. (An iterator of 
maps might be more useful as a building block.) This is not what one 
usually wants with string input, for instance, nor with range input. To 
illustrate:

import itertools as it

def grouper(n, iterable, fillvalue=None):
     "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
     args = [iter(iterable)] * n
     return it.zip_longest(*args, fillvalue=fillvalue)

print(*(grouper(3, 'ABCDEFG', 'x')))  # probably not wanted
print(*(''.join(g) for g in grouper(3, 'ABCDEFG', 'x')))
#
('A', 'B', 'C') ('D', 'E', 'F') ('G', 'x', 'x')
ABC DEF Gxx

--
What to do? One could easily write 20 different functions. So more 
thought is needed before adding anything. -1 on the idea as is.

For the doc, I think it would be helpful here and in most module 
subchapters if there were a subchapter table of contents at the top 
(under 9.1 in this case). Even though just 2 lines here (currently, but 
see below), it would let people know that there *is* a recipes section. 
After the appropriate tables, mention that there are example uses in the 
recipe section. Possibly add similar tables in the recipe section.

Another addition could be a new subsection on grouping (chunking) that 
would discuss post-processing of grouper (as discussed above), as well 
as other recipes, including ones specific to strings and sequences. It 
would essentially be a short how-to. Call it 9.1.3 "Grouping, Blocking, 
or Chunking Sequences and Iterables". The synonyms will help external 
searching. A toc would let people who have found this doc know to look 
for this at the bottom.

-- 
Terry Jan Reedy




More information about the Python-ideas mailing list