[Numpy-discussion] Huge arrays

Fri Sep 11 03:07:13 EDT 2009

On Tue, Sep 8, 2009 at 6:41 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>

> More precisely, 2GB for windows and 3GB for (non-PAE enabled) linux.

And just to further clarify, even with PAE enabled on linux, any
individual process has about a 3 GB address limit (there are hacks to
raise that to 3.5 or 4GB, but with a performance penalty).  But 4 GB
is the absolute max addressable RAM  for a single 32 bit process (even
if the kernel itself can use up to 64GB of physical RAM with PAE).
For gory details on Windows address space limits:

http://msdn.microsoft.com/en-us/library/bb613473%28VS.85%29.aspx

If running 64bit is not an option, I'd consider the "compress in RAM"
technique.  Delta-compression for most sampled signals should be quite
doable.  Heck, here's some untested pseudo-code:

import numpy
import zlib

data_row = numpy.zeros(2000000, dtype=numpy.int16)
# Fill up data_row

compressed_row_strings = []
data_row[1:] = data_row[:-1] - data_row[1:]    # quick n dirty delta encoding

compressed_row_strings.append(zlib.compress(data_row.tostring())

# Put a loop in there, reuse the row array, and you are almost all
set.  The delta
# encoding is optional, but probably useful for most "real world" 1d signals.
# If you don't have the time between samples to compress the whole row, break
# it into smaller chunks  (see zlib.compressobj())

-C