[Python-ideas] zip_strict() or similar in itertools ?
Peter Otten
__peter__ at web.de
Thu Apr 4 14:24:54 CEST 2013
Wolfgang Maier wrote:
> Dear all,
> the itertools documentation has the grouper() recipe, which returns
> consecutive tuples of a specified length n from an iterable. To do this,
> it uses zip_longest(). While this is an elegant and fast solution, my
> problem is that I sometimes don't want my tuples to be filled with a
> fillvalue (which happens if len(iterable) % n != 0), but I would prefer an
> error instead. This is important, for example, when iterating over the
> contents of a file and you want to make sure that it's not truncated.
> I was wondering whether itertools, in addition to the built-in zip() and
> zip_longest(), shouldn't provide something like zip_strict(), which would
> raise an Error, if its arguments aren't of equal length.
> zip_strict() could then be used in an alternative grouper() recipe.
>
> By the way, right now, I am using the following workaround for this
> problem:
>
> def iblock(iterable, bsize, strict=False):
> """Return consecutive lists of bsize items from an iterable.
>
> If strict is True, raises a ValueError if the size of the last block
> in iterable is smaller than bsize. If strict is False, it returns the
> truncated list instead."""
>
> it=iter(iterable)
> i=[it]*(bsize-1)
> while True:
> try:
> result=[next(it)]
> except StopIteration:
> # iterator exhausted, end the generator
> break
> for e in i:
> try:
> result.append(next(e))
> except StopIteration:
> # iterator exhausted after returning at least one item,
> # but before returning bsize items
> if strict:
> raise ValueError("only %d value(s) left in iterator,
> expected %d" % (len(result),bsize))
> else:
> pass
> yield result
>
> , which works well, but is about 3-4 times slower than the grouper()
> recipe. If you have alternative, faster solutions that I wasn't thinking
> of, I'd be very interested to here about them.
>
> Best,
> Wolfgang
A simple approach is
def strict_grouper(items, size, strict):
fillvalue = object()
args = [iter(items)]*size
chunks = zip_longest(*args, fillvalue=fillvalue)
prev = next(chunks)
for chunk in chunks:
yield prev
prev = chunk
if prev[-1] is fillvalue:
if strict:
raise ValueError
else:
prev = prev[:prev.index(fillvalue)]
yield prev
If that's fast enough it might be a candidate for the recipes section.
A partial solution I wrote a while a go is
http://code.activestate.com/recipes/497006-zip_exc-a-lazy-zip-that-ensures-that-all-iterables/
More information about the Python-ideas
mailing list