[Python-ideas] itertools.chunks(iterable, size, fill=None)
Terry Reedy
tjreedy at udel.edu
Sat Jun 30 23:09:36 CEST 2012
On 6/29/2012 4:32 PM, Georg Brandl wrote:
> On 26.06.2012 10:03, anatoly techtonik wrote:
>> Now that Python 3 is all about iterators (which is a user killer
>> feature for Python according to StackOverflow -
>> http://stackoverflow.com/questions/tagged/python) would it be nice to
>> introduce more first class functions to work with them? One function
>> to be exact to split string into chunks.
Nothing special about strings.
>> itertools.chunks(iterable, size, fill=None)
This is a renaming of itertools.grouper in 9.1.2. Itertools Recipes. You
should have mentioned this. I think of 'blocks' rather than 'chunks',
but I notice several SO questions with 'chunk(s)' in the title.
>> Which is the 33th most voted Python question on SO -
>> http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python/312464
I am curious how you get that number. I do note that there are about 15
other Python SO questions that seem to be variations on the theme. There
might be more if 'blocks' and 'groups' were searched for.
> Anatoly, so far there were no negative votes -- would you care to go
> another step and propose a patch?
That is because Raymond H. is not reading either list right now ;-)
Hence the Cc:. Also because I did not yet respond to a vague, very
incomplete idea.
From Raymond's first message on http://bugs.python.org/issue6021 , add
grouper:
"This has been rejected before.
* It is not a fundamental itertool primitive. The recipes section in
the docs shows a clean, fast implementation derived from zip_longest().
* There is some debate on a correct API for odd lengths. Some people
want an exception, some want fill-in values, some want truncation, and
some want a partially filled-in tuple. The alone is reason enough not
to set one behavior in stone.
* There is an issue with having too many itertools. The module taken as
a whole becomes more difficult to use as new tools are added."
---
This is not to say that the question should not be re-considered. Given
the StackOverflow experience in addition to that of the tracker and
python-list (and maybe python-ideas), a special exception might be made
in relation to points 1 and 3.
---
It regard to point 2: many 'proposals', including Anatoly's, neglect
this detail. But the function has to do *something* when seqlen %
grouplen != 0. So an 'idea' is not really a concrete programmable
proposal until 'something' is specified.
Exception -- not possible for an itertool until the end of the iteration
(see below). To raise immediately for sequences, one could wrap grouper.
def exactgrouper(sequence, k): # untested
if len(sequence) % k:
raise ValueError('Sequence length {} must be a multiple of group
length {}'.format(len(sequence), k)
else:
return itertools.grouper(sequence, k)
Of course, sequences can also be directly sequentially sliced (but
should the result be an iterable or sequence of blocks?). But we do not
have a seqtools module and I do not think there should be another method
added to the seq protocol.
Fill -- grouper always does this, with a default of None.
Truncate, Remainder -- grouper (zip_longest) cannot directly do this and
no recipes are given in the itertools docs. (More could be, see below.)
Discussions on python-list gives various implementations either for
sequences or iterables. For the latter, one approach is "it =
iter(iterable)" followed by repeated islice of the first n items.
Another is to use a sentinal for the 'fill' to detect a final incomplete
block (tuple for grouper).
def grouper_x(n, iterable): # untested
sentinal = object()
for g in grouper(n, iterable, sentinal):
if g[-1] != sentinal:
yield g
else:
# pass to truncate
# yield g[:g.index(sentinal) for remainer
# raise ValueError for delayed exception
---
The above discussion of point 2 touches on point 4, which Raymond
neglected in the particular message above but which has come up before:
What are the allowed input and output types? An idea is not a
programmable proposal until the domain, range, and mapping are specified.
Possible inputs are a specific sequence (string, for instance), any
sequence, any iterable. Possible outputs are a sequence or iterator of
sequence or iterator. The various python-list and stackoverflow posts
questions asks for various combinations. zip_longest and hence grouper
takes any iterable and returns an iterator of tuples. (An iterator of
maps might be more useful as a building block.) This is not what one
usually wants with string input, for instance, nor with range input. To
illustrate:
import itertools as it
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return it.zip_longest(*args, fillvalue=fillvalue)
print(*(grouper(3, 'ABCDEFG', 'x'))) # probably not wanted
print(*(''.join(g) for g in grouper(3, 'ABCDEFG', 'x')))
#
('A', 'B', 'C') ('D', 'E', 'F') ('G', 'x', 'x')
ABC DEF Gxx
--
What to do? One could easily write 20 different functions. So more
thought is needed before adding anything. -1 on the idea as is.
For the doc, I think it would be helpful here and in most module
subchapters if there were a subchapter table of contents at the top
(under 9.1 in this case). Even though just 2 lines here (currently, but
see below), it would let people know that there *is* a recipes section.
After the appropriate tables, mention that there are example uses in the
recipe section. Possibly add similar tables in the recipe section.
Another addition could be a new subsection on grouping (chunking) that
would discuss post-processing of grouper (as discussed above), as well
as other recipes, including ones specific to strings and sequences. It
would essentially be a short how-to. Call it 9.1.3 "Grouping, Blocking,
or Chunking Sequences and Iterables". The synonyms will help external
searching. A toc would let people who have found this doc know to look
for this at the bottom.
--
Terry Jan Reedy
More information about the Python-ideas
mailing list