[AstroPy] writing 2D string column to fits binary table

Erin Sheldon erin.sheldon at gmail.com
Tue Jan 27 13:28:14 EST 2015


Hi Stephen -

This is a difficult problem I think.

In FITS, the strings must get padded with spaces when written.  That is my
understanding of the format.

There is no way to know the intention of the user, whether spaces in the
original data are "significant" or not.  This information is not available in
a FITS file, only the size of the field in bytes.

The convention in many codes is to strip all trailing whitespace on reading.
If I recall correctly, IDL mrdfits does not strip.

If a user thinks the whitespace is significant they will be surprised when
reading that the spaces are not there.   If the user thinks the whitespace is
not significant they will may be surprised if it is there.

I took what I think is probably a controversial stance:  I will not lose user
data.  So I always read the full field and retain spaces.  It is up to the
user to strip them if that matters.

I am willing to reconsider based on good arguments.

An idea comes to mind: maybe support a strip_strings= keyword for the FITS
object and the reader routines.

thanks for the discussion,
-e

On 1/26/15, Stephen Bailey <stephenbailey at lbl.gov> wrote:
> astropy.io.fits and fitsio aficionados,
>
> I'm trying to write a FITS binary table that includes a column that is a 2D
> array of strings.  Curiously, only files written by astropy.io.fits and
> read by fitsio pass the test of actually reconstructing the input numpy
> array.
>
> astropy.io.fits loses the 2D information when reading back in and ends up
> with a 1D array of null separated characters per row.
>
> fitsio.fits pads its file with spaces when reading back in.
>
> But the file written by astropy.io.fits and then read by fitsio gets me
> back what I put in.
>
> Details:
>
> import numpy as np
>
> from astropy.io import fits
>
> import fitsio
>
>
> n = 10
>
> dtype = ([('ABC', 'S5', (3,)), ('X', int), ('Y', float)])
>
> data = np.zeros(n, dtype=dtype)
>
> data['X'] = np.arange(n)
>
> data['Y'] = np.arange(n)
>
> data['ABC'][:, 0] = 'a'
>
> data['ABC'][:, 1] = 'b'
>
> data['ABC'][:, 2] = 'c'
>
> data['ABC'][0] = ['x', 'y', 'z']
>
>
> fits.writeto('apio.fits', data)
>
> fitsio.write('fio.fits', data)
>
> astropy.io.fits complains:
>
> /Users/sbailey/anaconda/lib/python2.7/site-packages/astropy/io/fits/fitsrec.py:782:
> UserWarning: TDIM1 value (5,3) does not fit with the size of the array
> items (5).  TDIM1 will be ignored.
>
>   actual_nitems, indx + 1))
>
> fitsio seems happy.  Reading it back in:
>
> In [*15*]: np.array(fits.getdata('apio.fits'))
>
> Out[*15*]:
>
> array([('x\x00\x00\x00\x00y\x00\x00\x00\x00z', 0, 0.0),
>
>        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 1, 1.0),
>
>        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 2, 2.0),
>
>        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 3, 3.0),
>
>        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 4, 4.0),
>
>        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 5, 5.0),
>
>        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 6, 6.0),
>
>        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 7, 7.0),
>
>        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 8, 8.0),
>
>        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 9, 9.0)],
>
>       dtype=[('ABC', 'S15'), ('X', '>i8'), ('Y', '>f8')])
>
>
> In [*16*]: fitsio.read('fio.fits')
>
> Out[*16*]:
>
> array([(['x    ', 'y    ', 'z    '], 0, 0.0),
>
>        (['a    ', 'b    ', 'c    '], 1, 1.0),
>
>        (['a    ', 'b    ', 'c    '], 2, 2.0),
>
>        (['a    ', 'b    ', 'c    '], 3, 3.0),
>
>        (['a    ', 'b    ', 'c    '], 4, 4.0),
>
>        (['a    ', 'b    ', 'c    '], 5, 5.0),
>
>        (['a    ', 'b    ', 'c    '], 6, 6.0),
>
>        (['a    ', 'b    ', 'c    '], 7, 7.0),
>
>        (['a    ', 'b    ', 'c    '], 8, 8.0),
>
>        (['a    ', 'b    ', 'c    '], 9, 9.0)],
>
>       dtype=[('ABC', 'S5', (3,)), ('X', '>i8'), ('Y', '>f8')])
>
>
> In [*17*]: data
>
> Out[*17*]:
>
> array([(['x', 'y', 'z'], 0, 0.0), (['a', 'b', 'c'], 1, 1.0),
>
>        (['a', 'b', 'c'], 2, 2.0), (['a', 'b', 'c'], 3, 3.0),
>
>        (['a', 'b', 'c'], 4, 4.0), (['a', 'b', 'c'], 5, 5.0),
>
>        (['a', 'b', 'c'], 6, 6.0), (['a', 'b', 'c'], 7, 7.0),
>
>        (['a', 'b', 'c'], 8, 8.0), (['a', 'b', 'c'], 9, 9.0)],
>
>       dtype=[('ABC', 'S5', (3,)), ('X', '<i8'), ('Y', '<f8')])
>
> But fitsio reading the file that astropy.io.fits wrote is good (despite the
> warning that astropy.io.fits gave when writing the file):
>
> In [*18*]: fitsio.read('apio.fits')
>
> Out[*18*]:
>
> array([(['x', 'y', 'z'], 0, 0.0), (['a', 'b', 'c'], 1, 1.0),
>
>        (['a', 'b', 'c'], 2, 2.0), (['a', 'b', 'c'], 3, 3.0),
>
>        (['a', 'b', 'c'], 4, 4.0), (['a', 'b', 'c'], 5, 5.0),
>
>        (['a', 'b', 'c'], 6, 6.0), (['a', 'b', 'c'], 7, 7.0),
>
>        (['a', 'b', 'c'], 8, 8.0), (['a', 'b', 'c'], 9, 9.0)],
>
>       dtype=[('ABC', 'S5', (3,)), ('X', '>i8'), ('Y', '>f8')])
>
> Final check of combinations:
>
> In [*19*]: np.all(data == np.array(fits.getdata('apio.fits')))
>
> Out[*19*]: False
>
>
> In [*20*]: np.all(data == np.array(fits.getdata('fio.fits')))
>
> Out[*20*]: False
>
>
> In [*21*]: np.all(data == fitsio.read('apio.fits'))
>
> Out[*21*]: True
>
>
> In [*22*]: np.all(data == fitsio.read('fio.fits'))
>
> Out[*22*]: False
>
> Feature?  Bug?  User error?  Other?
>
> Thanks for the help,
>
> Stephen
>


-- 
Erin Scott Sheldon
Brookhaven National Laboratory erin dot sheldon at gmail dot com



More information about the AstroPy mailing list