[SciPy-User] Using GEMM with specified output-array

Tue Mar 11 14:23:27 EDT 2014

Thomas Unterthiner <thomas_unterthiner at web.de> wrote:
> Hi there!
> 
> I would like to use *GEMM without allocating an output-array. So I used 
> the functions from scipy.linalg.blas, however it seems as if the 
> 'overwrite_c' argument gets completely ignored:
> 
>      from scipy.linalg.blas import sgemm
>      X = np.random.randn(30, 50).astype(np.float32)
>      Y = np.random.randn(50, 30).astype(np.float32)
>      Z = np.zeros((X.shape[0], Y.shape[1]), dtype=np.float32)
>      res = f(1.0, X, Y, c=Z, trans_a=0, trans_b=0, overwrite_c=1)
>      assert res == np.dot(X, Y).all()
>      print res is Z
> 
> prints "False", meaning a new array is allocated for the result. 
> However, I want my result to be written into 'Z'. Before anyone asks: I 
> can't use np.dot's 'out' argument since I also want to specify 'alpha'.
> 
> Is there anything I'm missing here?  (I'm using scipy 0.13.3 / numpy 
> 1.8.0 with OpenBLAS on Ubuntu 13.10).

You are passing C contiguous arrays to Fortran. f2py will copy and
transpose them. If you want to avoid this, always pass Fortran order arrays
to scipy.linalg.* functions. You create a Fortran order view of a C array
by taking .T (transpose is O(1) in NumPy) or create the array with keyword
argument order='F'.  Check that .flags['F_CONTIGUOUS'] is True for all the
arrays you pass to scipy.linalg. 

The "overwrite_c" option in GEMM is just a suggestion. It is not enforced.
If f2py has to copy and transpose your C input array it does nothing.

Sturla