[AstroPy] PyFITS and mmap
Tom Aldcroft
aldcroft at head.cfa.harvard.edu
Fri Sep 23 11:08:16 EDT 2011
On Fri, Sep 23, 2011 at 8:37 AM, Paul Barrett <pebarrett at gmail.com> wrote:
> Erik,
>
> The performance impact can be greater than you might think. As an
> example, I have some Python code that uses subprocesses to divide the
> processing among eight or more processors. The data is shared between
> the parent and child processes using memory-mapping. The calculations
> take about 5 minutes per subprocess and then another 7 minutes or so
> to write the data to disk before the subprocess ends. I would
> therefore prefer that memory-mapped files be an option instead of the
> default to avoid such a possible performance hit. If it is the
> default, there may be situations where the performance is poor and the
> novice user would not know why PyFITS is performing so poorly. This
> adverse behavior may discourage users from using FITS files and
> instead use HDF5 files (i.e., the tables package), which, when I think
> about it, would be a good thing.
I'm not sure many novice users will be knowingly creating subprocesses
in their Python scripts. I would say the case of a novice user
deciding to open a 20 Gb FITS file (and complaining about performance)
is more likely. But I agree that you need to be pretty careful about
making a default change like this and consider (and test) a wide
variety of use cases. +1 on HDF5 for big datasets.
- Tom A
> On Thu, Sep 22, 2011 at 12:21 PM, Erik Bray <embray at stsci.edu> wrote:
>> Hi all,
>>
>> Every now and then PyFITS gets support requests from people trying to
>> work with very large FITS files (>4GB; I've seen as high as 50 GB) and
>> having trouble when they run out of memory.
>>
>> Normally I point them to the memmap=True option to pyfits.open(), and
>> that works for them. On 64-bit systems in particular there's more than
>> enough virtual address space to mmap very large files.
>>
>> And I got to thinking that while most FITS files I encounter are not
>> many gigabytes in size, they are still over 100 MB. And there are only
>> so many operations that actually require having an entire array in
>> memory at once. So maybe it would make sense to have PyFITS use mmap by
>> default.
>>
>> There could be some slight performance implications here: For example,
>> when reading the data a little bit a time mmap is a little a bit slower,
>> unsurprisingly. But in practice I don't think it's a very noticeable
>> difference, and the benefit--far less memory usage and more transparent
>> support for large files--outweigh any drawbacks I can think of.
>>
>> I'm just putting this out there because I wonder if there are any other
>> downsides to this that I'm not thinking of.
>>
>> Thanks,
>> Erik
>> _______________________________________________
>> AstroPy mailing list
>> AstroPy at scipy.org
>> http://mail.scipy.org/mailman/listinfo/astropy
>>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy
>
>
More information about the AstroPy
mailing list