[Numpy-discussion] Numpy Memory Error with corrcoef

Tue Mar 27 14:22:12 EDT 2012

Hi,

On Tue, Mar 27, 2012 at 12:12 PM, Nicole Stoffels <
nicole.stoffels at forwind.de> wrote:

> **
> Dear all,
>
> I get the following memory error while running my program:
>
> *Traceback (most recent call last):
>   File "/home/nistl/Software/Netzbetreiber/FLOW/src/MemoryError_Debug.py",
> line 9, in <module>
>     correlation = corrcoef(data_records)
>   File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line
> 1992, in corrcoef
>     c = cov(x, y, rowvar, bias)
>   File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line
> 1973, in cov
>     return (dot(X, X.T.conj()) / fact).squeeze()
> MemoryError*
>
> Here an easy example how to reproduce the error:
>
> *#!/usr/bin/env python2.7
>
> from pybinsel import open
> from numpy import *
>
> if __name__ == '__main__':
>
>     data_records = random.random((459375, 24))
>     correlation = corrcoef(data_records)
>
> *My real data has the same dimension. Is this a size problem of the array
> or did I simply make a mistake in the application of corrcoef?
>
> I hope that you can help me! Thanks!
>
As other ones has explained this approach yields an enormous matrix.
However, if I have understood your problem correctly you could implement a
helper class to iterate over all of your observations. Something like along
the lines (although it will take hours? with your data size) to iterate
over all correlations:

"""A helper class for correlations between observations."""

import numpy as np

 class Correlations(object):

    def __init__(self, data):

        self.m= data.shape[0]

        # compatible with corrcoef

        self.scale= self.m- 1

        self.data= data- np.mean(data, 1)[:, None]

        # but you may actually need to scale and translate data

        # more application speficic manner

        self.var= (self.data** 2.).sum(1)/ self.scale

     def obs_kth(self, k):

        c= np.dot(self.data, self.data[k])/ self.scale

        return c/ (self.var[k]* self.var)** .5

     def obs_iterate(self):

        for k in xrange(self.m):

            yield self.obs_kth(k)

 if __name__ == '__main__':

    data= np.random.randn(5, 3)

    print np.corrcoef(data).round(3)

    print

    c= Correlations(data)

    print np.array([p for p in c.obs_iterate()]).round(3)

My 2 cents,
-eat

>
> Best regards,
>
> Nicole Stoffels
>
> --
>
> Dipl.-Met. Nicole Stoffels
>
> Wind Power Forecasting and Simulation
>
> ForWind - Center for Wind Energy Research
> Institute of Physics
> Carl von Ossietzky University Oldenburg
>
> Ammerländer Heerstr. 136
> D-26129 Oldenburg
>
> Tel: +49(0)441 798 - 5079
> Fax: +49(0)441 798 - 5099
>
> Web  : www.ForWind.de
> Email: nicole.stoffels at forwind.de
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120327/57bd9efd/attachment.html>