numpy.memmap advice?

Lionel lionel.keene at gmail.com
Tue Feb 17 18:08:31 EST 2009


Hello all,

On a previous thread (http://groups.google.com/group/comp.lang.python/
browse_thread/thread/64da35b811e8f69d/67fa3185798ddd12?
hl=en&lnk=gst&q=keene#67fa3185798ddd12) I was asking about reading in
binary data. Briefly, my data consists of complex numbers, 32-bit
floats for real and imaginary parts. The data is stored as 4 bytes
Real1, 4 bytes Imaginary1, 4 bytes Real2, 4 bytes Imaginary2, etc. in
row-major format. I needed to read the data in as two separate numpy
arrays, one for real values and one for imaginary values.

There were several very helpful performance tips offered, and one in
particular I've started looking into. The author suggested a
"numpy.memmap" object may be beneficial. It was suggested I use it as
follows:


descriptor = dtype([("r", "<f4"), ("i", "<f4")])
data = memmap(filename, dtype=descriptor, mode='r').view(recarray)
print "First 100 real values:", data.r[:100]


I have two questions:
1) What is "recarray"?
2) The documentation for numpy.memmap claims that it is meant to be
used in situations where it is beneficial to load only segments of a
file into memory, not the whole thing. This is definately something
I'd like to be able to do as my files are frequently >1Gb. I don't
really see in the diocumentation how portions are loaded, however.
They seem to create small arrays and then assign the entire array
(i.e. file) to the memmap object. Let's assume I have a binary data
file of complex numbers in the format described above, and let's
assume that the size of the complex data array (that is, the entire
file) is 100x100 (rows x columns). Could someone please post a few
lines showing how to load the top-left 50 x 50 quadrant, and the lower-
right 50 x 50 quadrant into memmap objects? Thank you very much in
advance!

-L



More information about the Python-list mailing list