[Numpy-discussion] Numpy Memory Error with corrcoef
eat
e.antero.tammi at gmail.com
Tue Mar 27 14:22:12 EDT 2012
Hi,
On Tue, Mar 27, 2012 at 12:12 PM, Nicole Stoffels <
nicole.stoffels at forwind.de> wrote:
> **
> Dear all,
>
> I get the following memory error while running my program:
>
> *Traceback (most recent call last):
> File "/home/nistl/Software/Netzbetreiber/FLOW/src/MemoryError_Debug.py",
> line 9, in <module>
> correlation = corrcoef(data_records)
> File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line
> 1992, in corrcoef
> c = cov(x, y, rowvar, bias)
> File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line
> 1973, in cov
> return (dot(X, X.T.conj()) / fact).squeeze()
> MemoryError*
>
> Here an easy example how to reproduce the error:
>
> *#!/usr/bin/env python2.7
>
> from pybinsel import open
> from numpy import *
>
> if __name__ == '__main__':
>
> data_records = random.random((459375, 24))
> correlation = corrcoef(data_records)
>
> *My real data has the same dimension. Is this a size problem of the array
> or did I simply make a mistake in the application of corrcoef?
>
> I hope that you can help me! Thanks!
>
As other ones has explained this approach yields an enormous matrix.
However, if I have understood your problem correctly you could implement a
helper class to iterate over all of your observations. Something like along
the lines (although it will take hours? with your data size) to iterate
over all correlations:
"""A helper class for correlations between observations."""
import numpy as np
class Correlations(object):
def __init__(self, data):
self.m= data.shape[0]
# compatible with corrcoef
self.scale= self.m- 1
self.data= data- np.mean(data, 1)[:, None]
# but you may actually need to scale and translate data
# more application speficic manner
self.var= (self.data** 2.).sum(1)/ self.scale
def obs_kth(self, k):
c= np.dot(self.data, self.data[k])/ self.scale
return c/ (self.var[k]* self.var)** .5
def obs_iterate(self):
for k in xrange(self.m):
yield self.obs_kth(k)
if __name__ == '__main__':
data= np.random.randn(5, 3)
print np.corrcoef(data).round(3)
print
c= Correlations(data)
print np.array([p for p in c.obs_iterate()]).round(3)
My 2 cents,
-eat
>
> Best regards,
>
> Nicole Stoffels
>
> --
>
> Dipl.-Met. Nicole Stoffels
>
> Wind Power Forecasting and Simulation
>
> ForWind - Center for Wind Energy Research
> Institute of Physics
> Carl von Ossietzky University Oldenburg
>
> Ammerländer Heerstr. 136
> D-26129 Oldenburg
>
> Tel: +49(0)441 798 - 5079
> Fax: +49(0)441 798 - 5099
>
> Web : www.ForWind.de
> Email: nicole.stoffels at forwind.de
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120327/57bd9efd/attachment.html>
More information about the NumPy-Discussion
mailing list