[Python-ideas] Batching/grouping function for itertools

Steven D'Aprano steve at pearwood.info
Sun Dec 8 10:25:17 CET 2013


On Sun, Dec 08, 2013 at 05:02:05PM +1000, Nick Coghlan wrote:

> The windowing problem is too ill-defined - there are enough degrees of
> freedom that any API flexible enough to cover them all is harder to
> learn than just building out your own version that works the way you
> want it to, and a more restrictive API that *doesn't* cover all the
> variants introduces a sharp discontinuity between the "blessed"
> variant and the alternatives.

Playing Devil's Advocate here, I wonder if that is true though. It seems 
to me that there are two basic windowing variants: sliding windows, and 
discrete windows. That is, given a sequence [a, b, c, d, e, f, g, h, i] 
and a window size of 3, the two obvious, common results are:

# sliding window
(a,b,c), (b,c,d), (c,d,e), (d,e,f), (e,f,g), (f,g,h), (g,h,i)

# discrete windows
(a,b,c), (d,e,f), (g,h,i)


Surely anything else is exotic enough that there is no question about 
leaving it up to the individual programmer.

In the second case, there is a question about what to do with sequences 
that are not a multiple of the window size. Similar to zip(), there are 
two things one might do:

- pad with some given object;
- raise an exception

If you want to just ignore extra items, just catch the exception and 
continue. So that's a maximum of three window functions:

sliding(iterable, window_size)
discrete(iterable, window_size, pad=None)
strict_discrete(iterable, window_size)

or just two, if you combine discrete and strict_discrete:

discrete(iterable, window_size [, pad])
# raise if pad is not given


What other varieties are there? Surely none that are common. Once, for a 
lark, I tried to come up with one that was fully general -- as well as a 
window size, you could specify how far to advance the window each step. 
The sliding variety would advance by 1 each step, the discrete variety 
would advance by the window size. But I never found any reason to use it 
with any other step sizes. Surely anything else is more useful in theory 
than in practice.

(That's three times I've said something is "surely" true, always a sign 
my argument is weak *grin*)

Given that this windowing problem keeps coming up, there's no doubt in 
my mind that it is a useful, if not fundamental, iterator operation. 
Ruby's Enumerable module includes both:

http://ruby-doc.org/core-2.0.0/Enumerable.html

each_cons is what I've been calling a sliding window, and each_slice is 
what I've been calling discrete chunks.



-- 
Steven


More information about the Python-ideas mailing list