[AstroPy] reading one line from many small fits files

John K. Parejko john.parejko at yale.edu
Mon Jul 30 19:40:47 EDT 2012


Hey all,

This is really more of a pyfits question, but I've upgraded to pyfits 3.1 (SVN), which is the version in astropy.

I have data stored in thousands of ~few MB .fits files (photoObj files from SDSS) totaling a few TB of data, and I know the one single line I want to extract from some known subset of those files. But pyfits is taking more than a second per file to extract the fields I want, which seems very long, especially if it is using memmapped access, and thus should only have to read that single line (plus the header) from each file.

I'm doing something like this:

    result = np.empty(len(data),dtype=dtype)
    for i,x in enumerate(data):
	getfilename(x[somefield])
        photo = pyfits.open(photo,memmap=True)
        result[i] = photo[1].data[x[otherfield]-1]

Is there a better way to go about this? Is pyfits known to be quite slow when reading a single row from a lot of different files? Anyone have suggestions on how to speed this up?

Thanks,
John


More information about the AstroPy mailing list