[Numpy-discussion] Huge arrays
Chad Netzer
chad.netzer at gmail.com
Fri Sep 11 03:07:13 EDT 2009
On Tue, Sep 8, 2009 at 6:41 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
> More precisely, 2GB for windows and 3GB for (non-PAE enabled) linux.
And just to further clarify, even with PAE enabled on linux, any
individual process has about a 3 GB address limit (there are hacks to
raise that to 3.5 or 4GB, but with a performance penalty). But 4 GB
is the absolute max addressable RAM for a single 32 bit process (even
if the kernel itself can use up to 64GB of physical RAM with PAE).
For gory details on Windows address space limits:
http://msdn.microsoft.com/en-us/library/bb613473%28VS.85%29.aspx
If running 64bit is not an option, I'd consider the "compress in RAM"
technique. Delta-compression for most sampled signals should be quite
doable. Heck, here's some untested pseudo-code:
import numpy
import zlib
data_row = numpy.zeros(2000000, dtype=numpy.int16)
# Fill up data_row
compressed_row_strings = []
data_row[1:] = data_row[:-1] - data_row[1:] # quick n dirty delta encoding
compressed_row_strings.append(zlib.compress(data_row.tostring())
# Put a loop in there, reuse the row array, and you are almost all
set. The delta
# encoding is optional, but probably useful for most "real world" 1d signals.
# If you don't have the time between samples to compress the whole row, break
# it into smaller chunks (see zlib.compressobj())
-C
More information about the NumPy-Discussion
mailing list