[Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement

Sun Dec 2 21:33:33 EST 2012

On 12/2/2012 5:28 PM, Raul Cota wrote:
> Hello,
>
> First a quick summary of my problem and at the end I include the basic
> changes I am suggesting to the source (they may benefit others)
>
> I am ages behind in times and I am still using Numeric in Python 2.2.3.
> The main reason why it has taken so long to upgrade is because NumPy
> kills performance on several of my tests.
>
> I am sorry if this topic has been discussed before. I tried parsing the
> mailing list and also google and all I found were comments related to
> the fact that such is life when you use NumPy for small arrays.
>
> In my case I have several thousands of lines of code where data
> structures rely heavily on Numeric arrays but it is unpredictable if the
> problem at hand will result in large or small arrays. Furthermore, once
> the vectorized operations complete, the values could be assigned into
> scalars and just do simple math or loops. I am fairly sure the core of
> my problems is that the 'float64' objects start propagating all over the
> program data structures (not in arrays) and they are considerably slower
> for just about everything when compared to the native python float.
>
> Conclusion, it is not practical for me to do a massive re-structuring of
> code to improve speed on simple things like "a[0] < 4" (assuming "a" is
> an array) which is about 10 times slower than "b < 4" (assuming "b" is a
> float)
>
>
> I finally decided to track down the problem and I started by getting
> Python 2.6 from source and profiling it in one of my cases. By far the
> biggest bottleneck came out to be PyString_FromFormatV which is a
> function to assemble a string for a Python error caused by a failure to
> find an attribute when "multiarray" calls PyObject_GetAttrString. This
> function seems to get called way too often from NumPy. The real
> bottleneck of trying to find the attribute when it does not exist is not
> that it fails to find it, but that it builds a string to set a Python
> error. In other words, something as simple as "a[0] < 3.5" internally
> result in a call to set a python error .
>
> I downloaded NumPy code (for Python 2.6) and tracked down all the calls
> like this,
>
>    ret = PyObject_GetAttrString(obj, "__array_priority__");
>
> and changed to
>       if (PyList_CheckExact(obj) ||  (Py_None == obj) ||
>           PyTuple_CheckExact(obj) ||
>           PyFloat_CheckExact(obj) ||
>           PyInt_CheckExact(obj) ||
>           PyString_CheckExact(obj) ||
>           PyUnicode_CheckExact(obj)){
>           //Avoid expensive calls when I am sure the attribute
>           //does not exist
>           ret = NULL;
>       }
>       else{
>           ret = PyObject_GetAttrString(obj, "__array_priority__");
>
>
>
> ( I think I found about 7 spots )
>
>
> I also noticed (not as bad in my case) that calls to PyObject_GetBuffer
> also resulted in Python errors being set thus unnecessarily slower code.
>
>
> With this change, something like this,
>       for i in xrange(1000000):
>           if a[1] < 35.0:
>               pass
>
> went down from 0.8 seconds to 0.38 seconds.
>
> A bogus test like this,
> for i in xrange(1000000):
>           a = array([1., 2., 3.])
>
> went down from 8.5 seconds to 2.5 seconds.
>
>
>
> Altogether, these simple changes got me half way to the speed I used to
> get in Numeric and I could not see any slow down in any of my cases that
> benefit from heavy array manipulation. I am out of ideas on how to
> improve further though.
>
> Few questions:
> - Is there any interest for me to provide the exact details of the code
> I changed ?
>
> - I managed to compile NumPy through setup.py but I am not sure how to
> force it to generate pdb files from my Visual Studio Compiler. I need
> the pdb files such that I can run my profiler on NumPy. Anybody has any
> experience with this ? (Visual Studio)

Change the compiler and linker flags in 
Python\Lib\distutils\msvc9compiler.py to:

self.compile_options = ['/nologo', '/Ox', '/MD', '/W3', '/DNDEBUG', '/Zi']
self.ldflags_shared = ['/DLL', '/nologo', '/INCREMENTAL:YES', '/DEBUG']

Then rebuild numpy.

Christoph

>
> - The core of my problems I think boil down to things like this
> s = a[0]
> assigning a float64 into s as opposed to a native float ?
> Is there any way to hack code to change it to extract a native float
> instead ? (probably crazy talk, but I thought I'd ask :) ).
> I'd prefer to not use s = a.item(0) because I would have to change too
> much code and it is not even that much faster. For example,
>       for i in xrange(1000000):
>           if a.item(1) < 35.0:
>               pass
> is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes)
>
>
> I apologize again if this topic has already been discussed.
>
>
> Regards,
>
> Raul
>
>