[Numpy-discussion] Matrix rank default tolerance - is it too low?

Tue Jun 26 19:39:40 EDT 2012

On Tuesday, June 26, 2012, Charles R Harris wrote:

>
>
> On Tue, Jun 26, 2012 at 3:42 PM, Matthew Brett <matthew.brett at gmail.com>wrote:
>
> Hi,
>
> On Mon, Jun 18, 2012 at 3:50 PM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
> > Hi,
> >
> > On Sun, Jun 17, 2012 at 7:22 PM, Charles R Harris
> > <charlesr.harris at gmail.com> wrote:
> >>
> >>
> >> On Sat, Jun 16, 2012 at 2:33 PM, Matthew Brett <matthew.brett at gmail.com
> >
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> On Sat, Jun 16, 2012 at 8:03 PM, Matthew Brett <
> matthew.brett at gmail.com>
> >>> wrote:
> >>> > Hi,
> >>> >
> >>> > On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith <njs at pobox.com>
> wrote:
> >>> >> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris
> >>> >> <charlesr.harris at gmail.com> wrote:
> >>> >>>
> >>> >>>
> >>> >>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett
> >>> >>> <matthew.brett at gmail.com>
> >>> >>> wrote:
> >>> >>>>
> >>> >>>> Hi,
> >>> >>>>
> >>> >>>> I noticed that numpy.linalg.matrix_rank sometimes gives full rank
> for
> >>> >>>> matrices that are numerically rank deficient:
> >>> >>>>
> >>> >>>> If I repeatedly make random matrices, then set the first column
> to be
> >>> >>>> equal to the sum of the second and third columns:
> >>> >>>>
> >>> >>>> def make_deficient():
> >>> >>>>    X = np.random.normal(size=(40, 10))
> >>> >>>>    deficient_X = X.copy()
> >>> >>>>    deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2]
> >>> >>>>    return deficient_X
> >>> >>>>
> >>> >>>> then the current numpy.linalg.matrix_rank algorithm returns full
> rank
> >>> >>>> (10) in about 8 percent of cases (see appended script).
> >>> >>>>
> >>> >>>> I think this is a tolerance problem.  The ``matrix_rank``
> algorithm
> >>> >>>> does this by default:
> >>> >>>>
> >>> >>>> S = spl.svd(M, compute_uv=False)
> >>> >>>> tol = S.max() * np.finfo(S.dtype).eps
> >>> >>>> return np.sum(S > tol)
> >>> >>>>
> >>> >>>> I guess we'd we want the lowest tolerance that nearly always or
> >>> >>>> always
> >>> >>>> identifies numerically rank deficient matrices.  I suppose one
> way of
> >>> >>>> looking at whether the tolerance is in the right range is to
> compare
> >>> >>>> the calculated tolerance (``tol``) to the minimum singular value
> >>> >>>> (``S.min()``) because S.min() in our case should be very small and
> >>> >>>> indicate the rank deficiency. The mean value of tol / S.min() for
> the
> >>> >>>> current algorithm, across many iterations, is about 2.8.  We might
> >>> >>>> hope this value would be higher than 1, but not much higher,
> >>> >>>> otherwise
> >>> >>>> we might be rejecting too many columns.
> >>> >>>>
> >>> >>>> Our current algorithm for tolerance is the same as the 2-norm of
> M *
> >>> >>>> eps.  We're citing Golub and Van Loan for this, but now I look at
> our
> >>> >>>> copy (p 261, last para) - they seem to be suggesting using u * |M|
> >>> >>>> where u = (p 61, section 2.4.2) eps /  2. (see [1]). I think the
> >>> >>>> Golub
>
>
> I'm fine with that, and agree that it is likely to lead to fewer folks
> wondering why Matlab and numpy are different. A good explanation in the
> function documentation would be useful.
>
> Chuck
>
>
One potential problem is that it implies that it will always be the same as
any version of matlab's tolerance.  What if they change it in a future
release? How likely are we to even notice?

Cheers!
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120626/892d1efc/attachment.html>