Large data arrays?

Ole Streicher ole-usenet-spam at gmx.net
Thu Apr 23 06:22:42 EDT 2009


Hi,

for my application, I need to use quite large data arrays 
(100.000 x 4000 values) with floating point numbers where I need a fast
row-wise and column-wise access (main case: return a column with the sum 
over a number of selected rows, and vice versa).

I would use the numpy array for that, but they seem to be
memory-resistent. So, one of these arrays would use about 1.6 GB
memory which far too much. So I was thinking about a memory mapped
file for that. As far as I understand, there is one in numpy.

For this, I have two questions: 

1. Are the "numpy.memmap" array unlimited in size (resp. only limited
by the maximal file size)? And are they part of the system's memory
limit (~3GB for 32bit systems)?

2. Since I need row-wise as well as column-wise access, a simple usage
of a big array as memory mapped file will probably lead to a very poor
performance, since one of them would need to read values splattered
around the whole file. Are there any "plug and play" solutions for
that? If not: what would be the best way to solve this problem? 
Probably, one needs to use someting like the "Morton layout" for the
data. Would one then build a subclass of memmap (or ndarray?) that
implements this specific layout? How would one do that? (Sorry, I am
still a beginner with respect to python).

Best regards

Ole



More information about the Python-list mailing list