[SciPy-dev] Re: fblas tests

Pearu Peterson peterson at math.utwente.nl
Sun Apr 14 01:39:46 EDT 2002


Hi Eric,

On Sat, 13 Apr 2002, eric wrote:

> I was looking over the behavior of the new fblas wrappers and think we should
> make some changes back to how the previous version worked (for reasons explained
> below).  The old test suite is pretty extensive and covers many of the various
> calling approaches that can occur.  We should definitely try to get all of its
> tests to pass, as they exercise the interface fairly well.

I agree that fblas tests need more extensive tests (as it was in linalg1).

> The current blas wrappers sacrifice performance in an effort to make the
> functions more Pythonic in there behavior and also reduce the chance of operator
> error.  However, I consider the fblas and cblas very low level and aimed at
> maximum performance.  If people are using them, they should have access to the
> maximum speed that the underlying library can provide.  This means there
> shouldn't be any extra copying if we can help it.  The result may be a less
> "Pythonic" interface, but that is OK.  Pythonic approaches already exist for
> most of the problems.  Consider scopy.  Someone wanting to use Pythonic
> approaches for making strided copies from one array to another will use:
>
>     >>> a[:8:4] = b[:6:3]
>
> instead of:
>
>     >>> fblas.scopy(a,b,n=2,incx=4,incy=3).
>
> The person choosing the second would do so only for speed.  Also, they would
> expect the values of b to get changed in the operation since scopy moves
> portions of a into portions of b.  The current wrapper doesn't do this.  It
> makes a copy of b, copies the values of a into it, and then returns this array.
> It leaves b unaltered.

So you should use

    fblas.scopy(a,b,n=2,incx=4,incy=3,overwrite_y=1)

>     >>> x = arange(8.,typecode=Float32)*10
>     >>> y = arange(6.,typecode=Float32)
>     >>> x
>     array([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.])
>     >>> y
>     array([ 0.,  1.,  2.,  3.,  4.,  5.],'f')
>     # An output array is created with the changes to y instead of changing y in
> place.
>     >>> fblas.scopy(x,y,n=2,incx=4,incy=3)
>     array([  0.,   1.,   2.,  40.,   4.,   5.],'f')
>     # This guy should have been changed inplace
>     >>> y
>     array([ 0.,  1.,  2.,  3.,  4.,  5.],'f')

With overwrite_y option set you will have:

>>> from scipy import *
>>> x = arange(8.,typecode=Float32)*10
>>> y = arange(6.,typecode=Float32)
>>> x
array([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.])
>>> y
array([ 0.,  1.,  2.,  3.,  4.,  5.],'f')
>>> linalg.fblas.scopy(x,y,n=2,incx=4,incy=3,overwrite_y=1)
array([  0.,   1.,   2.,  40.,   4.,   5.],'f')
>>> y
array([  0.,   1.,   2.,  40.,   4.,   5.],'f')


> I did a comparison of this function with the Pythonic approach for very large
> arrays and it was actually slower.  I removed the "copy" fromt the
> "intent(in,copy,out) y" poriton of the interface definition, and scopy becomes

No need for that. Use overwrite_y.

>>> print linalg.fblas.scopy.__doc__
scopy - Function signature:
  y = scopy(x,y,[n,offx,incx,offy,incy,overwrite_y])
Required arguments:
  x : input rank-1 array('f') with bounds (*)
  y : input rank-1 array('f') with bounds (*)
Optional arguments:
  n := (len(x)-offx)/abs(incx) input int
  offx := 0 input int
  incx := 1 input int
  overwrite_y := 0 input int
  offy := 0 input int
  incy := 1 input int
Return objects:
  y : rank-1 array('f') with bounds (*)

Note that using intent(copy) makes default overwrite_y = 0 while
using intent(overwrite) sets default overwrite_y = 1.
I would prefer intent(copy) and using overwrite_y = 1 explicitely.

> noticably faster (1.5 or more speed up) over the indexed copy approach.  It also
> injects its values directly into b, so it removes the extra memory allocation
> and copying.  Here is the output from my test:
>
> C:\home\ej\wrk\scipy\linalg\build\lib.win32-2.1\linalg>python copy_test.py
> python:  0.0879670460911
> scopy -- without copy to output:  0.0521141653478
> scopy -- with copy to output:  0.157154051702

So, with various overwrite_y option values I get

python:  0.12
scopy -- with default overwrite_y=0:  0.21
scopy -- with overwrite_y=1:  0.07

> I've included the script below.
>
> Using this new approach, non-contiguous arrays passed into y will result in
> exceptions.  That is OK though.  I think the experts that will use these
> functions would rather force these functions to require contiguous input than to
> have them handle the non-contiguous case at the expense of optimal performance.
>
> So, I'm gonna work my way through the interface and try to switch the behavior
> back to the old approach.  I think it only requires removing the "copy" argument
> from most of the arguments.  Let me know if you think of other things I should
> change.  Also, let me know if there are other pitfalls I'm not thinking of here.

Please, don't remove "copy" arguments. Use overwrite_y=1 instead.

But if you insist exception, then let's change intent(copy) to
intent(overwrite), that is, switching the defaults of overwrite_* options.
That is fine with me.

Pearu




More information about the SciPy-Dev mailing list