[SciPy-User] Cholesky problem (I need dtrtrs, not dpotrs)
Sturla Molden
sturla at molden.no
Fri Jul 9 00:33:28 EDT 2010
Sturla Molden skrev:
> Is there any way of getting access to Lapack function dtrtrs or BLAS
> function dtrsm from SciPy? cho_solve does not really do what I want (it
> calls dpotrs). Which by the way is extremely annoying, since 99% of use
> cases for Cholesky (at least in statistics) require solving U'X = Y, not
> U'UX = Y as cho_solve do. Goloub & van Loan does not even bother to
> mention the dpotrs algorithm.
>
Just to elaborate on this:
Say we want to calculate the Mahalanobis distance from somw points X to
a distribution N(m,S). With cho_factor and cho_solve, that would be
cx = X - m
sqmahal = (cx*cho_solve(cho_factor(S),cx.T).T).sum(axis=1)
whereas a similar routine "tri_solve" using dtrtrs would be
cx = X - m
sqmahal = (tri_solve(cho_factor(S),cx.T).T**2).sum(axis=1)
This looks almost the same in Python, but the solution with tri_solve (dtrtrs) requires only half as many flops as cho_solve (dpotrs) does.
In many statistical applications requiring substantial amount of computation (EM algorithms, MCMC simulation, and the like), computing Mahalanobis distances can be the biggest bottleneck.
So that is one thing I really miss in
http://svn.scipy.org/svn/scipy/trunk/scipy/linalg/decomp_cholesky.py
P.S. In Fortran or C we would rather use a tight loop on dtrsv instead, computing sqmahal point by point, as it is more friendly to cache than dtrtrs on the whole block. Computing mahalanobis distances efficiently is a so common use case for Cholesky that I (almost) suggest this be added to SciPy as well.
<http://svn.scipy.org/svn/scipy/trunk/scipy/linalg/decomp_cholesky.py>
P.P.S. Those transpositions are actually required to make it run fast; in Matlab they would slow things down terribly. NumPy and Matlab is very different in this respect. A transpose does not create a new array in NumPy, it just switches the order flag between C and Fortran. C order is NumPy's native, but we must have Fortran order before calling BLAS or LAPACK. If we don't, f2py will make a copy with a transpose. So we avoid a transpose by taking a transpose. It might seem a bit paradoxical.
Sturla
More information about the SciPy-User
mailing list