Large data arrays?

John Machin sjmachin at lexicon.net
Thu Apr 23 19:53:36 EDT 2009


On Apr 23, 8:22 pm, Ole Streicher <ole-usenet-s... at gmx.net> wrote:
> Hi,
>
> for my application, I need to use quite large data arrays
> (100.000 x 4000 values) with floating point numbers where I need a fast
> row-wise and column-wise access (main case: return a column with the sum
> over a number of selected rows, and vice versa).
>
> I would use the numpy array for that, but they seem to be
> memory-resistent. So, one of these arrays would use about 1.6 GB
> memory which far too much. So I was thinking about a memory mapped
> file for that. As far as I understand, there is one in numpy.
>
> For this, I have two questions:
>
> 1. Are the "numpy.memmap" array unlimited in size (resp. only limited
> by the maximal file size)? And are they part of the system's memory
> limit (~3GB for 32bit systems)?
>
> 2. Since I need row-wise as well as column-wise access, a simple usage
> of a big array as memory mapped file will probably lead to a very poor
> performance, since one of them would need to read values splattered
> around the whole file. Are there any "plug and play" solutions for
> that? If not: what would be the best way to solve this problem?
> Probably, one needs to use someting like the "Morton layout" for the
> data. Would one then build a subclass of memmap (or ndarray?) that
> implements this specific layout? How would one do that? (Sorry, I am
> still a beginner with respect to python).

The Morton layout wastes space if the matrix is not square. Your 100K
x 4K is very non-square. Looks like you might want to use e.g. 25
Morton arrays, each 4K x 4K.

Cheers,
John



More information about the Python-list mailing list