[Numpy-discussion] ANN: MaskedArray as a subclass of ndarray - followup
Pierre GM
pgmdevlist at gmail.com
Fri Jan 19 17:28:51 EST 2007
Eric, Travis,
Thanks for the words of encouragements :)
I'm all in favor of having maskedarray ported to C, but I won't be able to do
it myself anytime soon. And I would have to learn C beforehands. Francesc's
suggestion of using Pyrex sounds nice, I'll try and see what I can do with
that
> Moving the implementation to the C-level would be awesome. In particular,
> __getitem__ and __setitem__ are incredibly slow with masked arrays compared
> to ndarrays, so using those inside python loops is basically a really bad
> idea currently. You always have to work with the _data and _mask attributes
> directly if you are concerned about performance.
Well, yeah, that's expected: __getitem__ tests whether the mask is defined
(not nomask) before trying to access the item. If you're using it in a loop,
you call the test each time, which is a bad idea. it's indeed far better to
call the test beforehand, and process _data and _mask separately
A fix would be to force the mask to an array of booleans all the time, but
that would slow things down elsewhere,as a lot of functions are artificially
accelerated with the nomask trick. A C implementation may render that trick
obsolete...
Another possibility would be to force the mask as an bool array, and keep an
extra flag on top, like hasmask. Hasmask would be False by default, and set
to True only if the mask is full of False. That'd require a mask.any() in
__array_finalize__, which might still slow things down.
> Also, there is a "bug" in Pierre's current implementation I spoke with him
> about, but currently have no solution for. numpy.add.accumulate doesn't
> work on arrays from the new maskedarray implementation, but does with the
> old one.
The fact that it works with 'old' masked arrays doesn't count: they're not
real ndarrays. They use the __array__ method to communicate with the rest of
numpy, that we shouldn't need.
> The problem seems to arise when you over-ride __getitem__ in an
> ndarray sub-class. See the code below for a demonstration:
I'm not sure that's actually the source of the problem.
ufuncs use the __array_wrap__ method to communicate with subclasses. ufuncs
methods seem to bypass that. In the meantime, the method of the MA.ufuncs
work as expected.
Could somebody give me some simple explanation about the behaviour of ufuncs
methods, on the Python side ? I'm obviously missing something here...
More information about the NumPy-Discussion
mailing list