[AstroPy] Advice on reading FITS file with many HDUs

Sat Jan 17 23:21:30 EST 2015

Hi David,

Erin Sheldon’s fitsio <https://github.com/esheldon/fitsio/> seems to do a
bit better:

In [1]: import fitsio

In [2]: f = fitsio.FITS("LSST_i_descwl.fits");

In [3]: f[0]
Out[3]:

  file: LSST_i_descwl.fits
  extension: 0
  type: IMAGE_HDU
  image info:
    data type: f4
    dims: [4096,4096]

Line 2 (opening the file) is nearly instantaneous, Line 3 (getting the info
for the first HDU) takes maybe a second. Other HDU access after that (even
near the end of the file) seems nearly instantaneous as well.

Best,
Kyle

On Sat, Jan 17, 2015 at 7:45 PM, David Kirkby <dkirkby at uci.edu> wrote:

> Thanks for the feedback, Perry. Yes, point taken about asking too much
> from FITS.  I was hoping there might be a simple fix, but my plan B is to
> save the 45K-2 HDUs in an HDF5 file instead. That's a bit more hassle but
> probably worth it.
>
> David
>
> On Sat Jan 17 2015 at 6:33:32 PM Perry Greenfield <stsci.perry at gmail.com>
> wrote:
>
>> My first reaction is that you really are asking a lot of the fits module
>> (45K headers!). If it was only to read the first two, we could consider
>> some option to that module not to locate all headers in the file, but just
>> the first N, but asking for random access basically requires reading most
>> of them anyway.
>>
>> My second reaction is this is a poor use of FITS. Isn't there something
>> more efficient you could do with storing the information in a FITS file? I
>> know, I know, someone else decided this (usually) and you don't  have any
>> control over that. Still, it makes me wonder seeing things like FITS taken
>> to this extreme level.
>>
>> Really, this seems a better job for CFITSIO (I'd be curious to see how
>> much faster it is though since it does have to run through most of the file
>> to find the header locations as well, but will avoid the overhead of
>> creating 45K objects).
>>
>> Cheers, Perry
>>
>> On Jan 17, 2015, at 8:29 PM, David Kirkby wrote:
>>
>> > I am reading a ~700 Mb file containing ~45K HDUs and looking for advice
>> on how to speed things up. I typically only want to read the first 2 HDUs
>> right after opening the file, and then a small number of randomly selected
>> HDUs while my program runs. The first 2 HDUs are the largest, but still
>> only represent about10% of the total file size.
>> >
>> > The following command takes about 42 seconds:
>> >
>> >   fits.open('LSST_i.fits',mode='readonly',memmap=False)
>> >
>> > Changing to memap=True makes no difference but leads to "Too many open
>> files" if I try to read too many HDUs after opening the file.
>> >
>> > Is this what I should expect? Any suggestions for opening the file and
>> reading a small number of HDUs faster? If necessary, I can change the
>> format of the file I am reading.
>> >
>> > In case it helps, the file I am testing with can be downloaded from:
>> >
>> >   http://srs.slac.stanford.edu/DataCatalog/?experiment=LSST-
>> DESC&folder=12915647
>> >
>> > thanks,
>> > David
>> > _______________________________________________
>> > AstroPy mailing list
>> > AstroPy at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/astropy
>>
>> _______________________________________________
>> AstroPy mailing list
>> AstroPy at scipy.org
>> http://mail.scipy.org/mailman/listinfo/astropy
>>
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20150117/83eff618/attachment.html>