[Numpy-discussion] Does np.std() make two passes through the data?

Mon Nov 22 12:07:56 EST 2010

On Mon, Nov 22, 2010 at 11:03 AM, Keith Goodman <kwgoodman at gmail.com> wrote:

> On Sun, Nov 21, 2010 at 5:56 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
> > On Sun, Nov 21, 2010 at 19:49, Keith Goodman <kwgoodman at gmail.com>
> wrote:
> >
> >> But this sample gives a difference:
> >>
> >>>> a = np.random.rand(100)
> >>>> a.var()
> >>   0.080232196646619805
> >>>> var(a)
> >>   0.080232196646619791
> >>
> >> As you know, I'm trying to make a drop-in replacement for
> >> scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in
> >> part. Either that, or suck it up and store the damn mean.
> >
> > The difference is less than eps. Quite possibly, the one-pass version
> > is even closer to the true value than the two-pass version.
>
> I wrote 3 cython prototype implementations of nanstd for 1d float64 arrays:
>
> >> a = np.random.rand(1000000)
>
> # numpy; doesn't take care of NaNs
> >> a.std()
>   0.28852169850186793
>
> # cython of np.sqrt(((arr - arr.mean())**2).mean())
> >> nanstd_twopass(a, ddof=0)
>   0.28852169850186798
>
> # cython of np.sqrt((arr*arr).mean() - arr.mean()**2)
> >> nanstd_simple(a, ddof=0)
>   0.28852169850187437
>
> #
> http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm
> >> nanstd_online(a, ddof=0)
>   0.28852169850187243
>
> # My target, scipy version
> >> from scipy.stats import nanstd
> >> nanstd(a, bias=True)
>   0.28852169850186798
>
> Timing:
>
> >> timeit nanstd(a, bias=True)
> 10 loops, best of 3: 27.8 ms per loop
> >> timeit a.std()
> 100 loops, best of 3: 11.5 ms per loop
> >> timeit nanstd_twopass(a, ddof=0)
> 100 loops, best of 3: 3.24 ms per loop
> >> timeit nanstd_simple(a, ddof=0)
> 1000 loops, best of 3: 1.6 ms per loop
> >> timeit nanstd_online(a, ddof=0)
> 100 loops, best of 3: 10.8 ms per loop
>
> nanstd_simple is the fastest but I assume the algorithm is no good for
> general use?
>
> I think I'll go with nanstd_twopass. It will most closely match
> numpy/scipy, is more robust than nanstd_simple, and is the second
> fastest.
>
> Here's the code. Improvements welcomed.
>
> @cython.boundscheck(False)
> @cython.wraparound(False)
> def nanstd_simple(np.ndarray[np.float64_t, ndim=1] a, int ddof):
>    "nanstd of 1d numpy array with dtype=np.float64 along axis=0."
>    cdef Py_ssize_t i
>    cdef int a0 = a.shape[0], count = 0
>    cdef np.float64_t asum = 0, a2sum=0, ai
>    for i in range(a0):
>        ai = a[i]
>        if ai == ai:
>            asum += ai
>            a2sum += ai * ai
>            count += 1
>    if count > 0:
>        asum = asum * asum
>        return sqrt((a2sum - asum / count) / (count - ddof))
>    else:
>        return np.float64(NAN)
>
> @cython.boundscheck(False)
> @cython.wraparound(False)
> def nanstd_online(np.ndarray[np.float64_t, ndim=1] a, int ddof):
>    "nanstd of 1d numpy array with dtype=np.float64 along axis=0."
>    cdef Py_ssize_t i
>    cdef int a0 = a.shape[0], n = 0
>    cdef np.float64_t mean = 0, M2 = 0, delta, x
>    for i in range(a0):
>        x = a[i]
>        if x == x:
>             n += 1
>            delta = x - mean
>            mean = mean + delta / n
>            M2 = M2 + delta * (x - mean)
>     if n > 0:
>        return np.float64(sqrt(M2 / (n - ddof)))
>    else:
>        return np.float64(NAN)
>
> @cython.boundscheck(False)
> @cython.wraparound(False)
> def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof):
>    "nanstd of 1d numpy array with dtype=np.float64 along axis=0."
>    cdef Py_ssize_t i
>    cdef int a0 = a.shape[0], count = 0
>    cdef np.float64_t asum = 0, a2sum=0, amean, ai, da
>    for i in range(a0):
>        ai = a[i]
>        if ai == ai:
>            asum += ai
>            count += 1
>    if count > 0:
>        amean = asum / count
>        asum = 0
>        for i in range(a0):
>            ai = a[i]
>            if ai == ai:
>                da = ai - amean
>                asum += da
>                a2sum += (da * da)
>        asum = asum * asum
>        return sqrt((a2sum - asum / count) / (count - ddof))
>    else:
>        return np.float64(NAN)
>

I wonder how the results would change if the size of the array was larger
than the processor cache?  I still can't seem to wrap my head around the
idea that a two-pass algorithm would be faster than a single-pass.  Is this
just a big-O thing where sometimes one algorithm will be faster than the
other based on the size of the problem?

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20101122/45ea942b/attachment.html>