[Numpy-discussion] optimizing ndarray.__setitem__
Robert Kern
robert.kern at gmail.com
Thu May 5 12:54:40 EDT 2011
On Thu, May 5, 2011 at 02:29, Christoph Groth <cwg at falma.de> wrote:
>> On Wed, May 4, 2011 at 6:19 AM, Christoph Groth <cwg at falma.de> wrote:
>>>
>>> Dear numpy experts,
>>>
>>> I have noticed that with Numpy 1.5.1 the operation
>>>
>>> m[::2] += 1.0
>>>
>>> takes twice as long as
>>>
>>> t = m[::2]
>>> t += 1.0
>
> Mark Wiebe <mwwiebe at gmail.com> writes:
>
>> You'd better time this in 1.6 too. ;)
>>
>> https://github.com/numpy/numpy/commit/f60797ba64ccf33597225d23b893b6eb11149860
>
> This seems to be exactly what I had in mind. Thanks for finding this.
>
>> The case of boolean mask indexing can't benefit so easily from this
>> optimization, but I think could see a big performance benefit if
>> combined __index__ + __i<op>__ operators were added to
>> Python. Something to consider, anyway.
>
> Has something like __index_iadd__ ever been considered seriously? Not
> to my (limited) knowledge.
Only on this list, I think. :-)
I don't think it will ever happen. Only numpy really cares about it,
and adding another __special__ method for each __iop__ is a lot of
additional methods that need to be supported.
> Indeed, the second loop executes twice as fast than the first in the
> following example (again with Numpy 1.5.1).
>
> import numpy
> m = numpy.zeros((1000, 1000))
> mask = numpy.arange(0, 1000, 2, dtype=int)
>
> for i in xrange(40):
> m[mask] += 1.0
>
> for i in xrange(40):
> t = m[mask]
> t += 1.0
>
> But wouldn't it be easy to optimize this as well, by not executing
> assignments where the source and the destination is indexed by the same
> mask object?
No. These two are not semantically equivalent. Your second example
does not actually modify m. For integer and bool mask arrays, m[mask]
necessarily makes a copy, so when you modify t via inplace addition,
you have only modified t and not m. The assignment back to m[mask] is
necessary.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the NumPy-Discussion
mailing list