[SciPy-dev] Best interface for computing the logarithm of the determinant?

Wed Feb 17 09:16:31 EST 2010

On Wed, Feb 17, 2010 at 2:49 AM, Dag Sverre Seljebotn
<dagss at student.matnat.uio.no> wrote:
> josef.pktd at gmail.com wrote:
>> On Tue, Feb 16, 2010 at 4:29 PM,  <josef.pktd at gmail.com> wrote:
>>
>>> On Tue, Feb 16, 2010 at 4:19 PM, Dag Sverre Seljebotn
>>> <dagss at student.matnat.uio.no> wrote:
>>>
>>>> Nathaniel Smith wrote:
>>>>
>>>>> So when you have a matrix whose determinant you want, it's often wise
>>>>> to compute the logarithm of the determinant instead of the determinant
>>>>> itself, because determinants involve lots and lots of multiplications
>>>>> and the result might otherwise underflow/overflow. Therefore, in
>>>>> scikits.sparse, I'd like to provide an API for doing this (and this is
>>>>> well-supported by the underlying libraries).
>>>>>
>>>>> But the problem is that for a general matrix, the determinant may be
>>>>> zero or negative. Obviously we can deal with this, but what's the best
>>>>> API? I'd like to use one consistently across the different
>>>>> factorizations in scikits.sparse, and perhaps eventually in numpy as
>>>>> well.
>>>>>
>>>>> Some options:
>>>>>
>>>>> 1) Split off the sign into a separate return value ('sign' may be 1, -1, 0):
>>>>> sign, value = logdet(A)
>>>>> actual_determinant = sign * exp(value)
>>>>>
>>>>> 2) Allow complex/infinite return values, even when A is a real matrix:
>>>>> logdet(eye(3)) == pi*1j
>>>>> logdet(zeros((3, 3))) == -Inf
>>>>>
>>>> I'm +1 for this one. It is easy to interpret, and if one knows that the
>>>> result is positive (like after a successful CHOLMOD LL^T) is is easy
>>>> enough to take the real part by appending .real.
>>>>
>>>> Also it makes more sense for code that may be dealing with both complex
>>>> and real matrices compared to 1).
>>>>
>>> I'm also in favor of this, but would prefer if promotion to complex
>>> only occurs if the result requires it.
>>> np.real_if_close  might be safer than .real
>>>
>>> Josef
>>>
>>>
>>>>> 3) "Scientific notation" (This is what UMFPACK's API does): return a
>>>>> mantissa and base-10 exponent:
>>>>> mantissa, exponent = logdet(A)
>>>>> actual_determinant = mantissa * 10 ** exponent
>>>>>
>>>> This is my least preferred option, because "10" has nothing to do with
>>>> floats.
>>>>
>>>>
>>>>> 4) Have separate functions for computing the sign, and the log of the
>>>>> absolute value (This is what GSL does, though it seems pointlessly
>>>>> inefficient):
>>>>> sign = sgndet(A)
>>>>> value = logdet(A)
>>>>> actual_determinant = sign * exp(value)
>>>>>
>>>> Well, it doesn't have to be inefficient, you use cache the return value
>>>> internally on id(A) and possibly weak reference callbacks to remove the
>>>> cached values...but pretty ugly, yeah.
>>>>
>>
>> An alternative would be to provide 2 public functions where logdet()
>> would have the same behavior as np.log(np.det())  and any convenient
>> representation for an internal functions.
>>
>> for multivariate normal distribution (loglikelihood) logdet would be
>> very useful. Is it possible to get an efficient implementation in
>> scipy or are the functions only available in CHOLMOD ?
>>
> You mean like this?
>
> 2 * np.sum(np.log(np.diagonal(np.linalg.cholesky(A))))

Thanks, I didn't know it is as simple as this.

>
> (Using scipy.linalg.cho_solve would be very slightly faster I guess.)
>
> Of course, in practice you usually get the Cholesky factor once and then
> reuse it both for solving and for getting the determinant. So what you
> want is a symbolic "solution" object which allows solving, getting
> determinant etc. scikits.sparse has this for Cholesky factors but SciPy
> doesn't.
>
> I don't think SciPy should get it though -- SciPy usually has a
> "lower-level" approach. Therefore I've written "oomatrix.py"
> (polymorphic matrices) which does this in a higher-level way for my own
> purposes (not available online yet...will have to see when I get time to
> polish it...):

I like to keep some one-liner functions around just to tell or remind
me how to do it.
Scipy.stats currently doesn't need it, (maybe when a multivariate
normal class is included, and I just saw that it might be possible to
improve stats.kde by storing the determinant instead of calculating it
repeatedly.)

In statsmodels, we store currently the cholesky of the pinv of the
covariance matrix, but statsmodels is still mostly least squares based
where we don't need the determinant, and we are still expanding MLE.

So, these linear algebra tricks will come in handy.

Josef

>
> A, B = ...numpy arrays...
> M = oomatrix.matrix(A) # Create immutable matrix object from array
> C M.solve_right(B, algorithm='cholesky') # does cholesky and caches it
> M.log_determinant() # from cached decomposition
> P, L = M.cholesky()
>
> Alternatively, constructing oomatrix.matrix(A, sparse=True) will
> transparently switch to using CHOLMOD as backend.
>
> Dag Sverre
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>