[SciPy-User] Correlation coefficient of large arrays

Vincent Davis vincent at vincentdavis.net
Tue Mar 16 09:53:43 EDT 2010


>
> @ Josef

I would loop by variable not by observations

example in attachment


Thanks for that example, My wording was poor (rows and col are the equlalent
in a symmetric matrix) but yur example is what I was thinking but not sure
how to do.

Thanks again


  *Vincent Davis
720-301-3003 *
vincent at vincentdavis.net
 my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>


On Tue, Mar 16, 2010 at 7:38 AM, <josef.pktd at gmail.com> wrote:

>
>
> On Tue, Mar 16, 2010 at 9:21 AM, Vincent Davis <vincent at vincentdavis.net>wrote:
>
>> Is there a way to calculate a column or row of the correlation matrix one
>> at a time?  I ma looking how including an additional set of observation
>> effect the correlation. For example if I have variables a,b,c,d..... and set
>> of observations 1-10 if the correlation is calculated for obs 1-5, I then
>> add observations 6-10 and what to know the average effect of this on the
>> correlation of c with (a,b,,d,e.....).
>> So I only need a column or a row at a time.
>> Just not clear to me how I would do this. I guess I just need to just DO
>> IT.
>>
>
> I would loop by variable not by observations
>
> example in attachment
>
> Josef
>
>
>
>>
>>   *Vincent Davis
>> 720-301-3003 *
>> vincent at vincentdavis.net
>>  my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis>
>>
>>
>> On Mon, Mar 15, 2010 at 11:56 PM, <josef.pktd at gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, Mar 16, 2010 at 1:39 AM, Vincent Davis <vincent at vincentdavis.net
>>> > wrote:
>>>
>>>>  @Josef
>>>>
>>>> how much memory does a
>>>>
>>>> >>> 230000**2 = 52900000000L  float (double) array take ?
>>>>
>>>>
>>>>
>>>> I guess I don't have a real appreciation for how large this is. I can do
>>>> this numpy.ones((100000,50000),dtype=np.float64) and it uses about 85% of
>>>> the memory I have available. But thats a long ways from 230,000X230,000. Of
>>>> course the array is symmetric.
>>>>
>>>> Is it feasible to do writing it to the disk?
>>>> The end goal is to find the difference between two correlation arrays
>>>> and then calculate the mean of each column. Which then leaves me with an
>>>> array 1X230,000
>>>>
>>>
>>> If you don't really care about the correlation matrix itself and only
>>> need the column (or row) sum then I would just loop over it in batches and
>>> never construct the full matrix.
>>> e.g. take the first 1000 variables and calculate the correlation with all
>>> variables (1000 * 230000 -> 1000 for sum)
>>> and loop.
>>> Not using np.corrcoef would avoid some duplicate calculations, but there
>>> are still several intermediate arrays necessary. So maybe using pytables or
>>> similar would still be better to avoid duplicate calculations.
>>>
>>> Josef
>>>
>>>
>>>
>>>>
>>>>   *Vincent Davis
>>>> 720-301-3003 *
>>>> vincent at vincentdavis.net
>>>>  my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis>
>>>>
>>>>
>>>> On Mon, Mar 15, 2010 at 11:16 PM, <josef.pktd at gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 16, 2010 at 1:04 AM, Vincent Davis <
>>>>> vincent at vincentdavis.net> wrote:
>>>>>
>>>>>> I have an array 10 observations of 230,000 variables and what to find
>>>>>> the correlation coefficient between each variable.
>>>>>> numpy.corrcef(data) works except I can only do it with about 30,000
>>>>>> variables at a time. numpy.corrcef(data[:30000]). It uses up a lot of
>>>>>> memory.
>>>>>> Is there a better way?
>>>>>>
>>>>>
>>>>>
>>>>> how much memory does a
>>>>> >>> 230000**2
>>>>> 52900000000L
>>>>>
>>>>> float (double) array take ?
>>>>>
>>>>> Josef
>>>>> (I'm not going to try)
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>   *Vincent Davis
>>>>>> 720-301-3003 *
>>>>>> vincent at vincentdavis.net
>>>>>>  my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-User mailing list
>>>>>> SciPy-User at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> SciPy-User mailing list
>>>>> SciPy-User at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100316/46d18839/attachment.html>


More information about the SciPy-User mailing list