[Python-ideas] Introduce collections.Reiterable

Thu Sep 19 23:25:25 CEST 2013

On 9/19/2013 8:28 AM, Steven D'Aprano wrote:
> On Thu, Sep 19, 2013 at 06:31:12AM -0400, Terry Reedy wrote:
>> On 9/19/2013 4:59 AM, Neil Girdhar wrote:
>>> Well, generators are iterable, but if you write a function like:
>>>
>>> def f(s):
>>>       for x in s:
>>>               do_something(x)
>>>       for x in s:
>>>               do_something_else(x)
>>
>> This strikes me as bad design. It should perhaps a) be two functions or
>> b) take two iterable arguments or c) jam the two loops together.
>
> Perhaps, but sometimes there are hidden loops. Here's an example near
> and dear to my heart... *wink*
>
> def variance(data):
>      # Don't do this.
>      sumx = sum(data)
>      sumx2 = sum(x**2 for x in data)
>      ss = sumx2 - (sumx**2)/n
>      return ss/(n-1)
>
>
> Ignore the fact that this algorithm is numerically unstable.

Lets not ;-)

> It fails
> for iterator arguments, because sum(data) consumes the iterator and
> leaves sumx2 always equal to zero.

This is doubly bad design because the two 'hidden' loops are trivially 
jammed together in one explicit loop, while use of Reiterable would not 
remove the numerical instability. While it may seem that a numerically 
stable solution needs two loops (the second to sum (x-sumx)**2), the two 
loops can still be jammed together with the Method of Provisional Means.

http://www.stat.wisc.edu/~larget/math496/mean-var.html
http://www.statistical-solutions-software.com/BMDP-documents/BMDP-Formula1.pdf

Also called 'online algorithm' and 'Weighted incremental algorithm' in
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

This was invented and used back when re-iteration of large datasets (on 
cards or tape) was possible but very slow (1970s or before). (Restack or 
rewind and reread might triple the (expensive) run time.)

-- 
Terry Jan Reedy