[SciPy-dev] Re: fblas tests
Pearu Peterson
peterson at math.utwente.nl
Sun Apr 14 01:39:46 EDT 2002
Hi Eric,
On Sat, 13 Apr 2002, eric wrote:
> I was looking over the behavior of the new fblas wrappers and think we should
> make some changes back to how the previous version worked (for reasons explained
> below). The old test suite is pretty extensive and covers many of the various
> calling approaches that can occur. We should definitely try to get all of its
> tests to pass, as they exercise the interface fairly well.
I agree that fblas tests need more extensive tests (as it was in linalg1).
> The current blas wrappers sacrifice performance in an effort to make the
> functions more Pythonic in there behavior and also reduce the chance of operator
> error. However, I consider the fblas and cblas very low level and aimed at
> maximum performance. If people are using them, they should have access to the
> maximum speed that the underlying library can provide. This means there
> shouldn't be any extra copying if we can help it. The result may be a less
> "Pythonic" interface, but that is OK. Pythonic approaches already exist for
> most of the problems. Consider scopy. Someone wanting to use Pythonic
> approaches for making strided copies from one array to another will use:
>
> >>> a[:8:4] = b[:6:3]
>
> instead of:
>
> >>> fblas.scopy(a,b,n=2,incx=4,incy=3).
>
> The person choosing the second would do so only for speed. Also, they would
> expect the values of b to get changed in the operation since scopy moves
> portions of a into portions of b. The current wrapper doesn't do this. It
> makes a copy of b, copies the values of a into it, and then returns this array.
> It leaves b unaltered.
So you should use
fblas.scopy(a,b,n=2,incx=4,incy=3,overwrite_y=1)
> >>> x = arange(8.,typecode=Float32)*10
> >>> y = arange(6.,typecode=Float32)
> >>> x
> array([ 0., 10., 20., 30., 40., 50., 60., 70.])
> >>> y
> array([ 0., 1., 2., 3., 4., 5.],'f')
> # An output array is created with the changes to y instead of changing y in
> place.
> >>> fblas.scopy(x,y,n=2,incx=4,incy=3)
> array([ 0., 1., 2., 40., 4., 5.],'f')
> # This guy should have been changed inplace
> >>> y
> array([ 0., 1., 2., 3., 4., 5.],'f')
With overwrite_y option set you will have:
>>> from scipy import *
>>> x = arange(8.,typecode=Float32)*10
>>> y = arange(6.,typecode=Float32)
>>> x
array([ 0., 10., 20., 30., 40., 50., 60., 70.])
>>> y
array([ 0., 1., 2., 3., 4., 5.],'f')
>>> linalg.fblas.scopy(x,y,n=2,incx=4,incy=3,overwrite_y=1)
array([ 0., 1., 2., 40., 4., 5.],'f')
>>> y
array([ 0., 1., 2., 40., 4., 5.],'f')
> I did a comparison of this function with the Pythonic approach for very large
> arrays and it was actually slower. I removed the "copy" fromt the
> "intent(in,copy,out) y" poriton of the interface definition, and scopy becomes
No need for that. Use overwrite_y.
>>> print linalg.fblas.scopy.__doc__
scopy - Function signature:
y = scopy(x,y,[n,offx,incx,offy,incy,overwrite_y])
Required arguments:
x : input rank-1 array('f') with bounds (*)
y : input rank-1 array('f') with bounds (*)
Optional arguments:
n := (len(x)-offx)/abs(incx) input int
offx := 0 input int
incx := 1 input int
overwrite_y := 0 input int
offy := 0 input int
incy := 1 input int
Return objects:
y : rank-1 array('f') with bounds (*)
Note that using intent(copy) makes default overwrite_y = 0 while
using intent(overwrite) sets default overwrite_y = 1.
I would prefer intent(copy) and using overwrite_y = 1 explicitely.
> noticably faster (1.5 or more speed up) over the indexed copy approach. It also
> injects its values directly into b, so it removes the extra memory allocation
> and copying. Here is the output from my test:
>
> C:\home\ej\wrk\scipy\linalg\build\lib.win32-2.1\linalg>python copy_test.py
> python: 0.0879670460911
> scopy -- without copy to output: 0.0521141653478
> scopy -- with copy to output: 0.157154051702
So, with various overwrite_y option values I get
python: 0.12
scopy -- with default overwrite_y=0: 0.21
scopy -- with overwrite_y=1: 0.07
> I've included the script below.
>
> Using this new approach, non-contiguous arrays passed into y will result in
> exceptions. That is OK though. I think the experts that will use these
> functions would rather force these functions to require contiguous input than to
> have them handle the non-contiguous case at the expense of optimal performance.
>
> So, I'm gonna work my way through the interface and try to switch the behavior
> back to the old approach. I think it only requires removing the "copy" argument
> from most of the arguments. Let me know if you think of other things I should
> change. Also, let me know if there are other pitfalls I'm not thinking of here.
Please, don't remove "copy" arguments. Use overwrite_y=1 instead.
But if you insist exception, then let's change intent(copy) to
intent(overwrite), that is, switching the defaults of overwrite_* options.
That is fine with me.
Pearu
More information about the SciPy-Dev
mailing list