[Numpy-discussion] MaskedArray __setitem__ Performance

Alexander Michael lxander.m at gmail.com
Fri Feb 15 23:12:37 EST 2008


In part of some code I'm rewriting from carrying around a data and
mask array to using MaskedArray, I read data into an array from an
input stream. By its nature this a "one at a time" process, so it is
basically a loop over assigning single elements (in no predetermined
order) of already allocated arrays. Unfortunately, using MaskedArray
in this way is significantly slower. The sample code below
demonstrates that for this particular procedure, filling the
MaskedArray is 32x slower than working with the two arrays I had been
carying around. It appears that I can regain the fill performance by
working on _data and _mask directly. I can guarantee that the
MaskedArrays I'm working with have been created with a dense mask as
I've done below (there are always some masked elements, so there is no
gain in shrinking to nomask). Is this safe? If not, can I make it safe
for this particular performance critical section? I'm assuming that
doing array operations won't incur this sort of penalty when I get
further into my translation. Some overhead is acceptable for the
convenience of not dragging around the mask and thinking about it all
of the time, but hopefully less than 2x slower.

Thanks!
Alex

import numpy

def get_ndarrays():
    return (numpy.zeros((5000,500), dtype=float),
            numpy.ones((5000,500), dtype=bool))

import timeit

t_base = timeit.Timer(
    'a[0,0] = 1.0; m[0,0] = False', 'from __main__ import
get_ndarrays; a,m = get_ndarrays()'
).timeit(1000)/1000
print t_base

6.97574691756e-007

import numpy.ma

def get_maskedarray():
    return numpy.ma.MaskedArray(
        numpy.zeros((5000,500), dtype=float),
        numpy.ones((5000,500), dtype=bool)
    )

t_ma = timeit.Timer(
    'a[0,0] = 1.0', 'from __main__ import get_maskedarray; a =
get_maskedarray()'
).timeit(1000)/1000
print t_ma, t_ma/t_base

2.26880790715e-005 32.5242290749

t_ma_com = timeit.Timer(
    'd[0,0] = 1.0; m[0,0] = False', 'from __main__ import
get_maskedarray, get_setter; a = get_maskedarray(); d,m =
a._data,a._mask'
).timeit(1000)/1000
print t_ma_com, t_ma_com/t_base

7.34450886914e-007 1.05286343612



More information about the NumPy-Discussion mailing list