[SciPy-user] Ols for np.arrays and masked arrays
Pierre GM
pgmdevlist at gmail.com
Fri Jan 16 17:13:21 EST 2009
Josef,
I'm rewriting your module, expect some update in the next few hours
(days...).
Still, I have some generic comments:
* If you need a function that supports both ndarrays and masked
arrays, force the inputs to be masked arrays, that'll be easier.
* Use ma.fix_invalid to transform a ndarray w/ or w/o NaNs into a
masked array: the NaNs will automatically be masked, and the
underlying data fixed (a copy is made, no worry).
* if you need to mask an element, just mask it directly: you don't
have to set it to NaN and then use np.isnan for the mask. So, instead
of:
x_0 = x[:,0].copy()
x_0[0] = np.nan
x_0 = ma.masked_array(x_0, np.isnan(x_0))
just do:
x_0 = ma.array(x[:,0])
x_0[0] = ma.masked
* When mask is a boolean ndarray, just use x[~mask] instead of
x[mask==False].
* To get rid of the missing data in x, use x.compressed() or emulate
it with x.data[~ma.getmaskarray(x)]. ma.getmaskarray(x) always returns
a ndarray with the same length as x, whereas ma.getmask(x) can return
nomask.
* when manipulating masked arrays, if performance is an issue,
decompose the process in manipulating the data and the mask
separately. The easiest is to use .filled to get a pure ndarray for
the data. The choice of the fill_value depends on the application. In
ma.polyfit, we fill y with 0, which doesn't really matter as the
corresponding coefficients of x will be 0 (through vander).
> Also this is the first time, that
> I use masked arrays, and I'm not sure I found the best way.
Don't worry, practice makes perfect.
>
> I treat masked arrays or nans by removing all observations with masked
> or nan values for the internal calculation, but keep the mask around,
> for the case when it is needed for some output.
You keep the *common* mask, which sounds OK. Removing the missing
observations seems the way to go
> I looked at
> np.ma.polyfit, which uses a dummy fill value (0) before calling least
> squares, but in general it will be difficult to find "neutral" fill
> values..
cf explanation above.
>
More information about the SciPy-User
mailing list