[Numpy-discussion] Does np.std() make two passes through the data?
Keith Goodman
kwgoodman at gmail.com
Sun Nov 21 21:33:57 EST 2010
On Sun, Nov 21, 2010 at 5:56 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Sun, Nov 21, 2010 at 19:49, Keith Goodman <kwgoodman at gmail.com> wrote:
>
>> But this sample gives a difference:
>>
>>>> a = np.random.rand(100)
>>>> a.var()
>> 0.080232196646619805
>>>> var(a)
>> 0.080232196646619791
>>
>> As you know, I'm trying to make a drop-in replacement for
>> scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in
>> part. Either that, or suck it up and store the damn mean.
>
> The difference is less than eps. Quite possibly, the one-pass version
> is even closer to the true value than the two-pass version.
Good, it passes the Kern test.
Here's an even more robust estimate:
>> var(a - a.mean())
0.080232196646619819
Which is better, numpy's two pass or the one pass on-line method?
>> test()
NumPy error: 9.31135e-18
Nanny error: 6.5745e-18 <-- One pass wins!
def test(n=100000):
numpy = 0
nanny = 0
for i in range(n):
a = np.random.rand(10)
truth = var(a - a.mean())
numpy += np.absolute(truth - a.var())
nanny += np.absolute(truth - var(a))
print 'NumPy error: %g' % (numpy / n)
print 'Nanny error: %g' % (nanny / n)
print
More information about the NumPy-Discussion
mailing list