[Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

Mathew S. Madhavacheril mathewsyriac at gmail.com
Wed Oct 26 16:12:05 EDT 2016


On Wed, Oct 26, 2016 at 3:20 PM, <josef.pktd at gmail.com> wrote:

>
>
> On Wed, Oct 26, 2016 at 3:11 PM, Mathew S. Madhavacheril <
> mathewsyriac at gmail.com> wrote:
>
>>
>>
>> On Wed, Oct 26, 2016 at 2:56 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>>> On Wed, Oct 26, 2016 at 11:13 AM, Stephan Hoyer <shoyer at gmail.com>
>>> wrote:
>>> > On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril
>>> > <mathewsyriac at gmail.com> wrote:
>>> >>
>>> >> On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer <shoyer at gmail.com>
>>> wrote:
>>> >>>
>>> >>> I wonder if the goals of this addition could be achieved by simply
>>> adding
>>> >>> an optional `cov` argument
>>> >>>
>>> >>> to np.corr, which would provide a pre-computed covariance.
>>> >>
>>> >>
>>> >> That's a fair suggestion which I'm happy to switch to. This
>>> eliminates the
>>> >> need for two new functions.
>>> >> I'll add an optional `cov = False` argument to numpy.corrcoef that
>>> returns
>>> >> a tuple (corr, cov) instead.
>>> >>
>>> >>>
>>> >>>
>>> >>> Either way, `covcorr` feels like a helper function that could exist
>>> in
>>> >>> user code rather than numpy proper.
>>> >>
>>> >>
>>> >> The user would have to re-implement the part that converts the
>>> covariance
>>> >> matrix to a correlation
>>> >> coefficient. I made this PR to avoid that code duplication.
>>> >
>>> >
>>> > With the API I was envisioning (or even your proposed API, for that
>>> matter),
>>> > this function would only be a few lines, e.g.,
>>> >
>>> > def covcorr(x):
>>> >     cov = np.cov(x)
>>> >     corr = np.corrcoef(x, cov=cov)
>>>
>>> IIUC, if you have a covariance matrix then you can compute the
>>> correlation matrix directly, without looking at 'x', so corrcoef(x,
>>> cov=cov) is a bit odd-looking. I think probably the API that makes the
>>> most sense is just to expose something like the covtocorr function
>>> (maybe it could have a less telegraphic name?)? And then, yeah, users
>>> can use that to build their own covcorr or whatever if they want it.
>>>
>>
>> Right, agreed, this is why I said `x` becomes redundant when `cov` is
>> specified
>> when calling `numpy.corrcoef`.  So we have two alternatives:
>>
>> 1) Have `np.corrcoef` accept a boolean optional argument `covmat = False`
>> that lets
>> one obtain a tuple containing the covariance and the correlation matrices
>> in the same call
>> 2) Modify my original PR so that `np.covtocorr` remains (with possibly a
>> better
>> name) but remove `np.covcorr` since this is easy for the user to add.
>>
>> My preference is option 2.
>>
>
> cov2corr is a useful function
> http://www.statsmodels.org/dev/generated/statsmodels.stats.
> moment_helpers.cov2corr.html
> I also wrote the inverse function corr2cov, but AFAIR use it only in some
> test cases.
>
>
> I don't think adding any of the options to corrcoef or covcor is useful
> since there is no computational advantage to it.
>

I'm not sure I agree with that statement. If a user wants to calculate both
a covariance and correlation matrix,
they currently have two options:
A) Call np.cov and np.corrcoef separately, which takes at least twice as
long as one call to np.cov. For data-sets that
I am used to, a np.cov call takes 5-10 seconds.
B) Call np.cov and then separately implement their own correlation matrix
code, which means the user
isn't able to fully take advantage of code that is already in numpy.

In any case, I've updated the PR:
https://github.com/numpy/numpy/pull/8211

Relative to my original PR, it:
a) removes the numpy.covcorr function which the user can easily implement
b) have numpy.cov2corr be the function exposed in the API (previously
called numpy.covtocorr in the PR), which accepts a pre-calculated covariance
matrix
c) have numpy.corrcoef call numpy.cov2corr




>
>
>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20161026/19664fb3/attachment.html>


More information about the NumPy-Discussion mailing list