itertools comments [Was: Re: RELEASED: Python 2.3a2]

Mon Feb 24 13:06:43 EST 2003

On 2003-02-21, Steven Taschuk wrote:

> Quoth Raymond Hettinger:
> > "Alexander Schmolck"
>   [...]
> > > def xwindow(iter, n=2, s=1):
> > >     r"""Move an `n`-item (default 2) windows `s` steps (default 1) at a
> > >     time over `iter`.
> >
I'm +1 on both xwindow and xgroup (and I actually find xwindow the more
useful of the two).

> > I've needed this one only one time in twenty-five years of programming
> > (for the markov algorithm).  Again, solid use cases are needed.
>
> Schmolck's xgroup(it, n) is xwindow(it, n, n).
>
> More seriously:
>
> Go pairwise to compute a sequence's first differences (for its own
> sake, or to, say, approximate a function's derivative or compute
> its Newton series).
>
Had this use case and other pairwaise cases (e.g. the vertices->edges
case), in several different programs.

> Generalizing, go k-wise to compute g(n) = f(n) - f(n-k).  (Some
> stock analysts use this as an indicator; they call it "momentum",
> I think.  They also use g(n) = f(n)/f(n-k).)
>
> Go pairwise on a token stream to get one-token lookahead in a
> parser.  (This one's my favourite.  But see below re "fading".)
>
I had this use case too.

> Go 3-wise to smooth f(0), f(1), ... by replacing f(x) with the
> average of f(x-1), f(x), and f(x+1).
>
> Go 3-wise to implement Euler's sequence accelerator:
> 	t(n) = s(n+1) - (s(n+1) - s(n))**2/(s(n-1) - 2*s(n) + s(n+1))
>
> Go 3-wise to compute linear automata in which the next state of a
> cell depends on its neighbours' states.
>
> These can be summarized by saying that going k-wise with step 1
> implements a FIFO buffer k objects wide.  I don't think the
> usefulness of such buffers is controversial, though it's certainly
> open to question whether xwindow() would be the best way to
> implement them.
>
xwindow is actually a generalized convolution (but please don't call it
[ix]conv :-).  Since so many things are expressible as convolution even
within the constraints of linearity, xwindow is cetainly useful.

> All of the above (except for implementing xgroup() in terms of
> xwindow()) use a step of 1.  I can only think of one case where a
> step other than 1 or the window size would be useful: drawing a
> Bezier spline on a polyline by first going pairwise to insert
> midpoints, then going 3-wise with step 2 for the actual spline
> computation.  But this is dubious as a use case, since it's pretty
> much as easy, and probably clearer, to go 3-wise with step 1 on
> the original data.  (More complicated splines than the one I have
> in mind might do better.)
>
> So perhaps the step argument is unnecessary.
>
The step argument indeed seems like a rarely-needed optimization [1]_ The
only case where I see it being worthy is xgroup, so I'm +1 on dropping the
step from xwindow.

.. [1] It's equivallent ``islice(xwindow(...), 0, <infinity>, step)``.
       BTW, why is there no way to give infinite stop argument to islice?
       None would do, or just allow keyword args...

> Note also that these use cases differ in their desired behaviour
> at the end of the sequence:
>     - Most want the window to stop when its leading edge goes past the
>       end of the underlying data.
>     - The parser with lookahead wants the window to "fade out", stopping
>       only when its *trailing* edge goes past the end of the underlying
>       data.
>     - A smoothing algorithm might want either of the above.
>     - As Ratzleff pointed out, some use cases involving polygons want
>       the leading edge to cycle back to the beginning of the sequence,
>       stopping when the trailing edge reaches the end of the sequence.
> The same point might be made for the beginning of the sequence: a
> smoothing algorithm that wants fade-out might also want fade-in.
> When fading out (in), there's also the question of whether the
> window ought to shrink (grow) or be padded out with, say, None.
>
Matlab's convolution function has 3 modes: full - returns A + B - 1
elements, same - returns A elements, valid - returns A - B + 1 elements.
For the first two cases, it pads with zeros.  For python, it would make
sense to either pad with None or shorten the tuple instead.  There is also
the option to pad with the first/last value but that's rarely used.  And
of course there is cyclic padding.

> One interpretation of this embarrassment of variation is that
> xwindow() is an over-generalization.
>
I think this edge-effects business deserves generalization in itself.

Make xwindow() use the "valid" mode - only process tuples that fit
completely inside the list.  E.g.:

>>> x = range(5)
>>> list(xwindow(x, 3))
[(0, 1, 2), (1, 2, 3), (2, 3, 4)]

This is the most basic and sensible mode.

Now all the other modes (except for shortening the tuple) can be simulated
by chaining (+1 on chain :-) something to the iterator passed to xwindow:

>>> list(xwindow(chain(x, times(2, None)), 3))
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, None), (4, None, None)]
>>> list(xwindow(chain(x, times(2, x[-1])), 3))
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 4), (4, 4, 4)]
>>> list(xwindow(chain(x, x[:2]), 3))
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 0), (4, 0, 1)]

Here of course I [ab]use the fact that x is a list that I can arbitrarily
subscript.  If x was a real iterable, we would need separate functions to
pad it the last two ways:

# *Please*, somebody add "yield every" to Python; should I write a PEP?

def padlast(n, i):
    """Yield the values of `i`, followed by the last one `n` more times."""
    for val in i:
       yield val
    for i in xrange(n):
       yield val

def padcycle(n, i):
    """Yield the values of `i`, followed by the `n` first ones again."""
    i = iter(i)
    vals = []
    for val in islice(i, n):
        yield val
        vals.append(val)
    for val in i:
        yield val
    for k in xrange(n):
        yield vals[k % len(vals)]  # cycles repeatedly when len(vals) < n

>>> list(xwindow(padlast(2, x), 3))
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 4), (4, 4, 4)]
>>> list(xwindow(padcycle(2, x), 3))
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 0), (4, 0, 1)]

I'm not sure these functions should be included but they are pretty handy
in conjunction with xwindow...

-- 
Beni Cherniavsky <cben at tx.technion.ac.il>

The mind of a good coder knows what his computer would do for any of his
programs.  The computer of a good hacker knows what his mind would do if
it weren't for his programs.