[SciPy-user] Reading in data as arrays, quickly and easily?

Todd Miller jmiller at stsci.edu
Mon Jul 12 14:51:57 EDT 2004


On Mon, 2004-07-12 at 13:19, Sebastian Haase wrote:
> On Saturday 10 July 2004 10:29 am, Eric Jonas wrote:
> > > I assume you talk about Numeric, but in case you are open for numarray I
> > > use numarray's memmap quite successfully on files even larger than 1 GB
> > > (Linux; I think the effective limit for Windows might be lower ). It
> > > works for all datatypes and for byteswapped data too. You can skip any
> > > amount of bytes by having your mem-"slice" start at any offset you want.
> > > I actually  map the first part into a record-array so that I can read the
> > > parts of the "header"-information I'm interested in.
> >
> > Well, I had been focusing on numarray, because everything I read seems
> > to suggest that it's the wave of the future, although at the same time
> > no one really seems to be using it much yet. May I ask how much larger
> > than 1 GB?  I'm dealing with between 1-20 GB EEG files, and for some
> > reason I don't thinK I'll be able to afford 64-bit hardware in the near
> > future : )
> 
> I'm also interested in maybe 20 GB (3D,time) image data. But that will require 
> us to buy 64 hardware, I think. 
> E.g. on (32bit) Linux the maximum address space for a user application is 2 GB 
> - and I have been told to now expect much more than half of that being 
> available for memmap. In other words, the largest file I was able to read was 
> maybe about 1.3 GB.
> 
> >
> > What I really want is to read in some fairly complex records, do endian
> > swapping, alignment, etc. all in C. I'm mostly interested in spectral
> > analysis, so the hope was that I'd be able to read in 32kB chunks at a
> > time for my periodograms.
> >
> > Also, I looked through the numarray docs again, and still couldn't find
> > anything about memory mapping -- any pointers? What command(s) have you
> > been using to pull this off?
> 
> The the source code file '/numarray/Lib/memmap.py' contains 200 lines of 
> comments/documentation just at the beginning of that file - I don't know why 
> that didn't make it into the official documention.

For me, memory mapping is a little esoteric so I wanted to observe some
successful applications of memmap in house at STScI before documenting
it and casting it in stone.  

We've learned a lot trying to apply memmap, and I believe that memmap
basically works so now is a reasonable time to document what we've got. 
One should be mindful, however, that even with a working memmap design,
there are perhaps easy to overlook limitations: total virtual address
space, swap space, and platform specific limits on mappable space. 
32-bits just ain't what it used to be.

Seeing this thread, Perry asked that I add a section on memmap to the
manual so I'll do it as soon as I can.

Regards,
Todd Miller





More information about the SciPy-User mailing list