[Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement

Mon Dec 3 08:56:55 EST 2012

On Mon, Dec 3, 2012 at 6:14 AM, Nathaniel Smith <njs at pobox.com> wrote:
> On Mon, Dec 3, 2012 at 1:28 AM, Raul Cota <raul at virtualmaterials.com> wrote:
>> I finally decided to track down the problem and I started by getting
>> Python 2.6 from source and profiling it in one of my cases. By far the
>> biggest bottleneck came out to be PyString_FromFormatV which is a
>> function to assemble a string for a Python error caused by a failure to
>> find an attribute when "multiarray" calls PyObject_GetAttrString. This
>> function seems to get called way too often from NumPy. The real
>> bottleneck of trying to find the attribute when it does not exist is not
>> that it fails to find it, but that it builds a string to set a Python
>> error. In other words, something as simple as "a[0] < 3.5" internally
>> result in a call to set a python error .
>>
>> I downloaded NumPy code (for Python 2.6) and tracked down all the calls
>> like this,
>>
>>   ret = PyObject_GetAttrString(obj, "__array_priority__");
>>
>> and changed to
>>      if (PyList_CheckExact(obj) ||  (Py_None == obj) ||
>>          PyTuple_CheckExact(obj) ||
>>          PyFloat_CheckExact(obj) ||
>>          PyInt_CheckExact(obj) ||
>>          PyString_CheckExact(obj) ||
>>          PyUnicode_CheckExact(obj)){
>>          //Avoid expensive calls when I am sure the attribute
>>          //does not exist
>>          ret = NULL;
>>      }
>>      else{
>>          ret = PyObject_GetAttrString(obj, "__array_priority__");
>>
>> ( I think I found about 7 spots )
>
> If the problem is the exception construction, then maybe this would
> work about as well?
>
> if (PyObject_HasAttrString(obj, "__array_priority__") {
>     ret = PyObject_GetAttrString(obj, "__array_priority__");
> } else {
>     ret = NULL;
> }
>
> If so then it would be an easier and more reliable way to accomplish this.
>
>> I also noticed (not as bad in my case) that calls to PyObject_GetBuffer
>> also resulted in Python errors being set thus unnecessarily slower code.
>>
>> With this change, something like this,
>>      for i in xrange(1000000):
>>          if a[1] < 35.0:
>>              pass
>>
>> went down from 0.8 seconds to 0.38 seconds.
>
> Huh, why is PyObject_GetBuffer even getting called in this case?
>
>> A bogus test like this,
>> for i in xrange(1000000):
>>          a = array([1., 2., 3.])
>>
>> went down from 8.5 seconds to 2.5 seconds.
>
> I can see why we'd call PyObject_GetBuffer in this case, but not why
> it would take 2/3rds of the total run-time...
>
>> - The core of my problems I think boil down to things like this
>> s = a[0]
>> assigning a float64 into s as opposed to a native float ?
>> Is there any way to hack code to change it to extract a native float
>> instead ? (probably crazy talk, but I thought I'd ask :) ).
>> I'd prefer to not use s = a.item(0) because I would have to change too
>> much code and it is not even that much faster. For example,
>>      for i in xrange(1000000):
>>          if a.item(1) < 35.0:
>>              pass
>> is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes)
>
> I'm confused here -- first you say that your problems would be fixed
> if a[0] gave you a native float, but then you say that a.item(0)
> (which is basically a[0] that gives a native float) is still too slow?
> (OTOH at 40% speedup is pretty good, even if it is just a
> microbenchmark :-).) Array scalars are definitely pretty slow:
>
> In [9]: timeit a[0]
> 1000000 loops, best of 3: 151 ns per loop
>
> In [10]: timeit a.item(0)
> 10000000 loops, best of 3: 169 ns per loop
>
> In [11]: timeit a[0] < 35.0
> 1000000 loops, best of 3: 989 ns per loop
>
> In [12]: timeit a.item(0) < 35.0
> 1000000 loops, best of 3: 233 ns per loop
>
> It is probably possible to make numpy scalars faster... I'm not even
> sure why they go through the ufunc machinery, like Travis said, since
> they don't even follow the ufunc rules:
>
> In [3]: np.array(2) * [1, 2, 3]  # 0-dim array coerces and broadcasts
> Out[3]: array([2, 4, 6])
>
> In [4]: np.array(2)[()] * [1, 2, 3]  # scalar acts like python integer
> Out[4]: [1, 2, 3, 1, 2, 3]

I thought it still behaves like a numpy "animal"

>>> np.array(-2)[()] ** [1, 2, 3]
array([-2,  4, -8])
>>> np.array(-2)[()] ** 0.5
nan

>>> np.array(-2).item() ** [1, 2, 3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for ** or pow(): 'int' and 'list'
>>> np.array(-2).item() ** 0.5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: negative number cannot be raised to a fractional power

>>> np.array(0)[()] ** (-1)
inf
>>> np.array(0).item() ** (-1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: 0.0 cannot be raised to a negative power

and similar

I often try to avoid python scalars to avoid "surprising" behavior,
and try to work defensively or fixed bugs by switching to np.power(...)
(for example in the distributions).

Josef

>
> But you may want to experiment a bit more to make sure this is
> actually the problem. IME guesses about speed problems are almost
> always wrong (even when I take this rule into account and only guess
> when I'm *really* sure).
>
> -n
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion