[Python-ideas] Introduce collections.Reiterable
Terry Reedy
tjreedy at udel.edu
Thu Sep 19 23:25:25 CEST 2013
On 9/19/2013 8:28 AM, Steven D'Aprano wrote:
> On Thu, Sep 19, 2013 at 06:31:12AM -0400, Terry Reedy wrote:
>> On 9/19/2013 4:59 AM, Neil Girdhar wrote:
>>> Well, generators are iterable, but if you write a function like:
>>>
>>> def f(s):
>>> for x in s:
>>> do_something(x)
>>> for x in s:
>>> do_something_else(x)
>>
>> This strikes me as bad design. It should perhaps a) be two functions or
>> b) take two iterable arguments or c) jam the two loops together.
>
> Perhaps, but sometimes there are hidden loops. Here's an example near
> and dear to my heart... *wink*
>
> def variance(data):
> # Don't do this.
> sumx = sum(data)
> sumx2 = sum(x**2 for x in data)
> ss = sumx2 - (sumx**2)/n
> return ss/(n-1)
>
>
> Ignore the fact that this algorithm is numerically unstable.
Lets not ;-)
> It fails
> for iterator arguments, because sum(data) consumes the iterator and
> leaves sumx2 always equal to zero.
This is doubly bad design because the two 'hidden' loops are trivially
jammed together in one explicit loop, while use of Reiterable would not
remove the numerical instability. While it may seem that a numerically
stable solution needs two loops (the second to sum (x-sumx)**2), the two
loops can still be jammed together with the Method of Provisional Means.
http://www.stat.wisc.edu/~larget/math496/mean-var.html
http://www.statistical-solutions-software.com/BMDP-documents/BMDP-Formula1.pdf
Also called 'online algorithm' and 'Weighted incremental algorithm' in
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
This was invented and used back when re-iteration of large datasets (on
cards or tape) was possible but very slow (1970s or before). (Restack or
rewind and reread might triple the (expensive) run time.)
--
Terry Jan Reedy
More information about the Python-ideas
mailing list