time series calculation in list comprehension?

Sat Mar 11 05:21:03 EST 2006

Lonnie Princehouse wrote:

> You really want to use the value calculated for the i_th term in the
> (i+1)th term's evaluation.   

It may sometimes be necessary to recalculate the average for every iteration
to avoid error accumulation. Another tradeoff with your optimization is
that it becomes harder to switch the accumulation function from average to
max, say.

> While it's not easy (or pretty) to store state between iterations in a
> list comprehension, this is the perfect use for a generator:
> 
>   def generator_to_list(f):
>     return lambda *args,**keywords: list(f(*args,**keywords))
> 
>   @generator_to_list
>   def moving_average(sequence, n):
>     assert len(sequence) >= n and n > 0
>     average = sum(sequence[:n]) / n
>     yield average
>     for i in xrange(1, len(sequence)-n+1):
>       average += (sequence[i+n-1] - sequence[i-1]) / n
>       yield average

Here are two more that work with arbitrary iterables:

from __future__ import division

from itertools import islice, tee, izip
from collections import deque

def window(items, n):
    it = iter(items)
    w = deque(islice(it, n-1))
    for item in it:
        w.append(item)
        yield w # for a robust implementation:
                # yield tuple(w)
        w.popleft()

def moving_average1(items, n):
    return (sum(w)/n for w in window(items, n))

def moving_average2(items, n):
    first_items, last_items = tee(items)
    accu = sum(islice(last_items, n-1))
    for first, last in izip(first_items, last_items):
        accu += last
        yield accu/n
        accu -= first

While moving_average1() is even slower than your inefficient variant,
moving_average2() seems to be a tad faster than the efficient one.

Peter