[SciPy-Dev] Adding Normal-Inverse Wishart Distribution in scipy.stats module

Sat May 16 22:18:56 EDT 2020

Hello everyone. I was trying to implement the NIW (Normal-Inverse Wishart)
distribution in scipy.stats module and was wondering if it would make a
nice addition. A special case of this multivariate distribution which is
Normal-Inverse Gamma distribution has been proposed but stalled for a long
time (#6739 <https://github.com/scipy/scipy/pull/6739>).

It is often used as a prior in Multivariate Bayesian Linear Regression and
is just a product of Multivariate Normal Distribution and Inverse Wishart
distribution. Please find more about the distribution on the Wikipedia page
here <https://en.wikipedia.org/wiki/Normal-inverse-Wishart_distribution>.

It should not be “difficult” to implement it with the existing codebase but
the challenge is how to return the samples from rvs method and what
arguments to pass to the pdf method. There are two ways to approach this:

   1. Pass a single quantile with shape k x 1 (mu) and k x k (cov) stacked
   horizontally respectively to form an array of shape k x (k+1) to the pdf
   method. Use the same protocol to generate random variates. If multiple
   quantiles are passed to the pdf, we use the last two dimensions to infer
   the dimensionality.
      - Pros: Consistent with the API.
      - Cons: Not a great user experience as he/she has to stack the arrays
      and separate them manually before and after passing to pdf and rvs
      methods respectively.
   2. Pass two quantiles with shape k x 1 (mu) and k x k (cov) to the pdf
   method. Generate tuples of arrays of shape size x k x 1 and size x k x k
   containing the mu and cov quantiles respectively. Also, if the user
   wants to pass multiple quantiles to the pdf method, he/she should pass
   it with shape size x k x 1 and size x k x k.
      - Pros: User doesn’t have to stack the arrays manually before passing
      to pdf method and separate them after passing to the rvs method.
      - Cons: Not fully consistent with the API (though I doubt this is the
      case…).

Maybe slower methods like cdf and logcdf can be cythonized though I haven’t
put much thought to it. Approximation methods for special cases exist and
can be found in this paper <https://arxiv.org/pdf/1605.01019.pdf>.

Kind Regards,
Tirth Patel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20200517/aaf16957/attachment.html>