[AstroPy] super-high-speed parsing of large FITS files

Thu Mar 9 12:25:56 EST 2017

On 3 Mar 2017, at 1:50 am, Nathan Goldbaum <nathan12343 at gmail.com> wrote:
> 
> See https://github.com/h5py/h5py/issues/611
> 
Thanks for the link, that’s very useful information! I had been looking into the loadable-filter
functionality of HDF5, but never pursued it for lack of time. Would be great if Blosc support
finally could be implemented at the h5py level after all.

					Derek
> 
> On Thu, Mar 2, 2017 at 5:59 PM Derek Homeier <derek at astro.physik.uni-goettingen.de> wrote:
> Hi Stuart,
> >
> > We are choosing a file format for a high speed photometer. One proposal is a MEF file. Frame rates are kHz, and there are five detectors so each FITS file may have many thousands of HDUs.
> >
> > One of our requirements is to be able to reduce the data in real time, whilst the data is being written. This means retrieving a given HDU from such a file in a msec or less. It would be OK if finding an initial HDU corresponding to a given frame number is slower than this,  if we can grab *subsequent* blocks of 5 HDUs (1 per detector) in < 1 msec.
> >
> > So I am wondering:
> >
> >  a) is this even possible in principle, with low-level C-code?
> >  b) Can it be done via some un-sanctioned use of the private functions in the astropy FITs library?
> >
> if I understood right that you are not fully settled on using FITS as a format at all at this point, it might be
> worth looking into HDF5 instead. You may find some ideas about potential performance and optimisation
> possibilities in the PyTables documentation:
> http://www.pytables.org/usersguide/optimization.html
> 
> Unfortunately not all of these (in particular the Blosc compression library, afaik) are available with the
> h5py library used in astropy, but perhaps writing a direct interface to pytables could still be an option.