[AstroPy] PyFITS and mmap

James Turner jturner at gemini.edu
Thu Sep 22 22:39:28 EDT 2011


Hi Erik,

This probably depends on the details, but if data arrays are mapped
fairly transparently and operations are just a "little bit slower",
without the danger of exhausting memory and/or making the OS swap,
that certainly sounds like a net gain to me.

I assume there will be cases where it's not quite so simple and
things have to be kept in memory for specific performance reasons
or the working directory isn't writeable or whatever, but it seems
like a reasonable default. I don't have enough practical experience
with memory mapping to answer your question about downsides you
haven't thought of, but since you're testing the waters (and no-one
has commented yet) I thought I'd throw out my initial user reaction.
For what it's worth, we HAVE recently run into situations at Gemini
where we have exhausted 4Gb of RAM, typical of an end user machine,
and started discussing memory mapping. We're also not dealing with
files larger than 200Mb or so.

AFAICT, PyFITS doesn't do this by default just because not that
long ago it was running mainly on 32-bit systems (I remember
discussing it at the time and was told it would be more useful in
future, which is now).

Seems like some limited user testing would be in order first?

Cheers,

James.


> Hi all,
>
> Every now and then PyFITS gets support requests from people trying to
> work with very large FITS files (>4GB; I've seen as high as 50 GB) and
> having trouble when they run out of memory.
>
> Normally I point them to the memmap=True option to pyfits.open(), and
> that works for them.  On 64-bit systems in particular there's more than
> enough virtual address space to mmap very large files.
>
> And I got to thinking that while most FITS files I encounter are not
> many gigabytes in size, they are still over 100 MB.  And there are only
> so many operations that actually require having an entire array in
> memory at once.  So maybe it would make sense to have PyFITS use mmap by
> default.
>
> There could be some slight performance implications here: For example,
> when reading the data a little bit a time mmap is a little a bit slower,
> unsurprisingly.  But in practice I don't think it's a very noticeable
> difference, and the benefit--far less memory usage and more transparent
> support for large files--outweigh any drawbacks I can think of.
>
> I'm just putting this out there because I wonder if there are any other
> downsides to this that I'm not thinking of.
>
> Thanks,
> Erik
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy



More information about the AstroPy mailing list