[Python-ideas] Pre-PEP: adding a statistics module to Python

David Mertz mertz at gnosis.cx
Thu Aug 8 00:20:13 CEST 2013


I think having a one-pass option is useful, even at loss of precision.  I
do not believe that a warning should be a run-time thing though, but simply
a matter of documentation.  I guess the optional flag (with list-conversion
as default) is the best approach; although a function with a whole
different name doesn't seem implausible either... but indeed, the final
goal really should be to allow accumulation of many stats in the same
single-pass in this mode (not sure what that API would be).

The case of really long numeric streams feels like like it is common enough
to warrant this capability.  We might well have a billion numbers in a file
on a disk... or they might trickle in slowly from an actual instrument.
 That either uses too much memory or requires too much delay when
intermediate results should be possible.

Here's a question for the actual statisticians on the list (I'm not close
to this).  Would having a look-ahead window of moderate size (probably
configurable) do enough good in numeric accuracy to be worthwhile?
 Obviously, creating pathological cases is still possible, but in the
"normal" situation, does this matter enough?  I.e. if the function were to
read 100 numbers from an iterator, perform some manipulation on their
ordering or scaling, produce that better intermediate result, then do the
same with the next chunk of 100 numbers, is this enough of a win to have as
an option?



On Wed, Aug 7, 2013 at 3:08 PM, Brandon W Maister <bwmaister at gmail.com>wrote:

>
>
>
> On Wed, Aug 7, 2013 at 4:57 PM, Andrew Barnert <abarnert at yahoo.com> wrote:
>
>> On Aug 7, 2013, at 12:22, Brandon W Maister <bwmaister at gmail.com> wrote:
>>
>> I agree that the error is the way to go, but what about
>> `variance(iterable, one_pass=True)` defaulting to False, with an exception
>> that says "Warning: passing an iterable and using the one pass algorithm
>> can lead to a slight loss in accuracy" if an iterable is passed in?
>>
>>
>> First, I think you meant "iterator" there, not "iterable". (A list is an
>> iterable, but not an iterator.)
>>
>
> Oops, you're completely correct.
>
> Second, I'd expect that onepass=True would lead to a loss in accuracy no
>> matter what the other argument was.
>>
>> Finally, if passing onepass=True causes it to raise an exception, why
>> even have the option?
>>
>
> Sorry, I was unclear: I should have written something more like
> `allow_one_pass=True` and defaulting to False. Passing in True would
> silence the error, and force users to mean what they say.
>
> I don't feel strongly that it's a _good_ idea: I have never written code
> that would need to take advantage of a one pass algorithm, and I have no
> idea how common such code would be. If someone is at the point where
> they're trying to optimize their variance function they are probably at the
> point where they're moving beyond the stdlib anyway?
>
> brandon
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130807/e5feed81/attachment-0001.html>


More information about the Python-ideas mailing list