[Numpy-discussion] Matlab page on scipy wiki

Fri Feb 10 09:15:02 EST 2006

Sven Schreiber wrote:
> Fernando Perez schrieb:
> 
>>Bill Baxter wrote:
>>
>>>For what it's worth, matlab's rank function just calls svd, and
>>>returns the
>>>number singular values greater than a tolerance.  The implementation is a
>>>whopping 5 lines long.
>>
>>Yup, and it would be pretty much the same 5 lines in numpy, with the
>>same semantics.
>>
>>Here's a quick and dirty implementation for old-scipy (I don't have
>>new-scipy on this box):
>>
> 
> 
> Is there any reason not to use the algorithm implicit in lstsq, as in:
> rk = linalg.lstsq(M, ones(p))[2]

Simplicity?  lstsq goes through a lot of contortions (needed for other 
reasons), and uses lapack's *gelss.  If you read its man page:

PURPOSE
        DGELSS computes the minimum  norm  solution  to  a  real  linear  least
        squares problem: Minimize 2-norm(| b - A*x |).

        using  the  singular  value  decomposition  (SVD)  of A. A is an M-by-N
        matrix which may be rank-deficient.

        Several right hand side vectors b and solution vectors x can be handled
        in a single call; they are stored as the columns of the M-by-NRHS right
        hand side matrix B and the N-by-NRHS solution matrix X.

        The effective rank of A is determined by treating as zero those  singu-
        lar  values which are less than RCOND times the largest singular value.

So you've gone through all that extra complexity, to get back what a direct 
call to svd would give you (caveat: the quick version I posted used absolute 
tolerance, while this one is relative; that can be trivially fixed).

Given that a direct SVD call fits the definition of what we are computing (a 
numerical estimation of a matrix rank), I completely fail to see the point of 
going through several additional layers of unnecessary complexity, which both 
add cost and obscure the intent of the calculation.

But perhaps I'm missing something...

Cheers,

f