[AstroPy] Consider ASDF for hierarchical numpy data

Arnon Sela arnon.sela at gmail.com
Mon Dec 4 21:44:09 EST 2017


Thank you.  I do like that ADSF has text header like FITS has.

>From the example you sent (thank you very much), ADSF handles hierarchical
dictionary dataset. But:

1. When I tried a numpy.recarray dataset (as in the original example),  it
fails.  I understand It would be easy to convert recarrays to dictionaries,
but I was wondering if there is an inherent way to store data as recarrays.
Is there an option to store recarrays directly?

2. Also, elements are loaded (ndarrays) as
asdf.tags.core.ndarray.NDArrayType, is there a way to tell asdf to load as
numpy.ndarrays?

Thanks,


On Mon, Dec 4, 2017 at 5:00 PM, Daniel D'avella <ddavella at stsci.edu> wrote:

> There are some good suggestions in this thread. If you do in fact need to
> serialize your data to disk and if you're not tied to FITS for other
> reasons, you might consider using the Advanced Scientific Data Format
> (ASDF) which is designed specifically for this purpose. Here's an example
> of how to use ASDF to store the data set you described:
>
>
> >>> import asdf
>
> >>> import numpy as np
>
>
> >>> data = {'D1': np.linspace( 0, 100, 8*4,).reshape(8, 4),
>
>                   'D2': np.linspace( 0, 100, 10*5, ).reshape(10, 5),
>                    'ND': {'D1': np.linspace( 0, 100, 10*5,).reshape(10, 5),
>
>                              'D2': np.linspace( 0, 100, 8*4,).reshape(8,
> 4), }}}
>
>
> # Writing data to file on disk
>
> >>> outfile = asdf.AsdfFile(data)
>
> >>> outfile.write_to('data.asdf')
>
>
> # Reading data from file on disk
>
> >>> infile = asdf.open('data.asdf')
>
> >>> infile.tree
>
> {'D1': <array (unloaded) shape: [8, 4] dtype: float64>,
>
>  'D2': <array (unloaded) shape: [10, 5] dtype: float64>,
>  'ND': {'D1': <array (unloaded) shape: [10, 5] dtype: float64>,  'D2':
> <array (unloaded) shape: [8, 4] dtype: float64>}}
> # Data arrays can be accessed hierarchically from the top-level tree:
> >>> infile.tree['D1']
> array([[   0.        ,    3.22580645,    6.4516129 ,    9.67741935],
>        [  12.90322581,   16.12903226,   19.35483871,   22.58064516],
>        [  25.80645161,   29.03225806,   32.25806452,   35.48387097],
>        [  38.70967742,   41.93548387,   45.16129032,   48.38709677],
>        [  51.61290323,   54.83870968,   58.06451613,   61.29032258],
>        [  64.51612903,   67.74193548,   70.96774194,   74.19354839],
>        [  77.41935484,   80.64516129,   83.87096774,   87.09677419],
>        [  90.32258065,   93.5483871 ,   96.77419355,  100.        ]])
> >>> infile.tree['ND']
> {'D1': <array (unloaded) shape: [10, 5] dtype: float64>,
>  'D2': <array (unloaded) shape: [8, 4] dtype: float64>}
>
> The metadata contents of the ASDF file are human-readable:
>
> #ASDF 1.0.0
> #ASDF_STANDARD 1.1.0
> %YAML 1.1
> %TAG ! tag:stsci.edu:asdf/
> --- !core/asdf-1.0.0
> D1: !core/ndarray-1.0.0
>   source: 0
>   datatype: float64
>   byteorder: little
>   shape: [8, 4]
> D2: !core/ndarray-1.0.0
>   source: 1
>   datatype: float64
>   byteorder: little
>   shape: [10, 5]
> ND:
>   D1: !core/ndarray-1.0.0
>     source: 2
>     datatype: float64
>     byteorder: little
>     shape: [10, 5]
>   D2: !core/ndarray-1.0.0
>     source: 3
>     datatype: float64
>     byteorder: little
>     shape: [8, 4]
> asdf_library: !core/software-1.0.0 {author: Space Telescope Science
> Institute, homepage: 'http://github.com/spacetelescope/asdf',
>   name: asdf, version: 1.3.2.dev1044}
>
> The data arrays themselves are stored efficiently, and can even be
> compressed.
>
> ASDF is also capable of serializing various types from Astropy including
> tables, Time objects, units and quantities, and some transforms and
> coordinates.
>
> ASDF can be installed using pip:
> $ pip install asdf
>
> Basic documentation can be found here:
>
> http://asdf.readthedocs.io/en/latest/
>
>
> If you have any questions feel free to open an issue in our  github repo:
>
> https://github.com/spacetelescope/asdf
> <https://github.com/spacetelescope/asdf>
> GitHub - spacetelescope/asdf: ASDF (Advanced Scientific Data Format) is a
> next generation interchange format for scientific data
> <https://github.com/spacetelescope/asdf>
> github.com
> asdf - ASDF (Advanced Scientific Data Format) is a next generation
> interchange format for scientific data
>
>
>
> ------------------------------
> *From:* AstroPy <astropy-bounces+ddavella=stsci.edu at python.org> on behalf
> of astropy-request at python.org <astropy-request at python.org>
> *Sent:* Monday, December 4, 2017 4:51 PM
> *To:* astropy at python.org
> *Subject:* AstroPy Digest, Vol 135, Issue 2
>
> Send AstroPy mailing list submissions to
>         astropy at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.python.org/mailman/listinfo/astropy
> or, via email, send a message with subject or body 'help' to
>         astropy-request at python.org
>
> You can reach the person managing the list at
>         astropy-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of AstroPy digest..."
>
>
> Today's Topics:
>
>    1. Re: Nested recarrays in FITS (Paul Kuin)
>    2. Re: Nested recarrays in FITS (Peter Teuben)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 4 Dec 2017 21:08:04 +0000
> From: Paul Kuin <npkuin at gmail.com>
> To: Astronomical Python mailing list <astropy at python.org>
> Cc: Daniel Sela <danielsela42 at gmail.com>
> Subject: Re: [AstroPy] Nested recarrays in FITS
> Message-ID:
>         <CANoQ6N3gCT1Ek91-VMgzxYc4z4+UuiK3gMu1HwigpdnKn-oxBg at mail.
> gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I think that HDF does that for you. FIts is more flexible, but you have to
> do your own writes and retrievals. In the end you will be reinventing the
> wheel unless you check out how HDF does it, That's my opinion.
>
> Cheers,
>
>    Paul
>
> On Mon, Dec 4, 2017 at 9:02 PM, Arnon Sela <arnon.sela at gmail.com> wrote:
>
> > Dear Whom that Can Help,
> >
> > I have nested numpy recarray structure to be stored into Fits.
> > The following code is a just a test I used to build a nested structure
> > (data_for_fits variable in the last line of the code).
> >
> > Code start >>>>>>
> >
> > import numpy as np
> >
> > ''' The following two functions are adapted from:
> > adopted from https://stackoverflow.com/questions/32328889/numpy-
> > structured-array-from-arbitrary-level-nested-dictionary
> > '''
> >
> > def mkdtype(d):
> >     ''' Creates dtype for nested dictionary with numpy based type objects
> >     '''
> >     result = []
> >     for k, v in d.items():
> >         if isinstance(v,np.ndarray):
> >             result.append((k, v.dtype, v.shape))
> >         else:
> >             result.append((k, mkdtype(v)))
> >     return np.dtype(result)
> >
> > def dict2recarray(data, rec=None):
> >     ''' Creates numpy.recarray from data (dict)
> >     '''
> >     def _dict2recarray(data, rec):
> >         if rec.dtype.names:
> >             for n in rec.dtype.names:
> >                 _dict2recarray(data[n], rec[n])
> >         else:
> >             rec[:] = data
> >         return rec
> >
> >     dtype = mkdtype(data)
> >     if rec is None:
> >         rec = np.zeros(dtype.shape, dtype)
> >
> >     return _dict2recarray(data, rec)
> >
> > datan_raw = {'DATA': {'D1': np.linspace( 0, 100, 8*4,).reshape(8, 4),
> >                       'D2': np.linspace( 0, 100, 10*5, ).reshape(10, 5),
> >                       'ND': {'D1': np.linspace( 0, 100, 10*5,
> > ).reshape(10, 5),
> >                              'D2': np.linspace( 0, 100, 8*4,).reshape(8,
> > 4), }}}
> >
> > dtype = mkdtype(datan_raw)
> > *data_for_fits* = dict2recarray(datan_raw)
> >
> >
> > >>>>>> Code ends
> >
> > I couldn't find documentation on how to build such a FITS structure
> > (nested recarrays).
> >
> > One option is to build sub-recarrays into different BIN tables with a
> > header that would correspond to a nested key in the recarray. But that
> > would require creating another function to reconstruct the recarray
> > structure after reading the BIN tables from the FITS file.
> >
> > The better option is to build FITS is such a manner that would retrieve
> > the structure correctly on FITS load().
> >
> > Thank you for your help,
> >
> > Best regards.
> >
> > _______________________________________________
> > AstroPy mailing list
> > AstroPy at python.org
> > https://mail.python.org/mailman/listinfo/astropy
> >
> >
>
>
> --
>
> * * * * * * * * http://www.mssl.ucl.ac.uk/~npmk/ * * * *
> Paul Kuin, Mullard Space Science Laboratory, UCL
> <http://www.mssl.ucl.ac.uk/~npmk/>
> www.mssl.ucl.ac.uk
> Space Science, Supernovae, Novae, Gamma Ray Bursts, Solar Flares, Coronal
> Mass Ejections, Stellar Winds and Coronae, N. Paul M. Kuin
>
> Dr. N.P.M. Kuin      (n.kuin at ucl.ac.uk)
> phone +44-(0)1483 (prefix) -204111 (work)
> mobile +44(0)7908715953 <+44%207908%20715953>  skype ID: npkuin
> Mullard Space Science Laboratory  ? University College London  ?
> Holmbury St Mary ? Dorking ? Surrey RH5 6NT?  U.K.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/astropy/attachments/
> 20171204/8577ee9c/attachment-0001.html>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 4 Dec 2017 22:46:27 +0100
> From: Peter Teuben <teuben at astro.umd.edu>
> To: astropy at python.org
> Subject: Re: [AstroPy] Nested recarrays in FITS
> Message-ID: <b1419060-493c-bf7e-0ff0-0d2bcd1d31a1 at astro.umd.edu>
> Content-Type: text/plain; charset="utf-8"
>
>
> ?another thought on this:
>
> I think the original question was also limited in not explaining why fits
> was needed. I could argue for pickle. Paul is right, HDF might be a better
> match, especially if you have to switch to another language, HDF has a more
> native match to that. But does it have to be persistent data? otherwise
> using a python-c/fortran interface is far more efficient.? (I believe HDF
> is actually more flexible than the F in FITS).
>
> You can't beat a native pickle:
>
> ??? ??? ??? import pickle
> ??? ??? ??? pickle.dump(datan_raw,open("test.dat","wb"))
> ??? ??? ??? ..
> ??? ??? ??? new_raw = pickle.load(open("test.dat", "rb"))
>
> So perhaps we could return the question and ask in what situation you need
> this data structure (for).
>
> - peter
>
> On 12/04/2017 10:08 PM, Paul Kuin wrote:
> > I think that HDF does that for you. FIts is more flexible, but you have
> to do your own writes and retrievals. In the end you will be reinventing
> the wheel unless you check out how HDF does it, That's my opinion.?
> >
> > Cheers,?
> >
> > ? ?Paul
> >
> > On Mon, Dec 4, 2017 at 9:02 PM, Arnon Sela <arnon.sela at gmail.com <
> mailto:arnon.sela at gmail.com <arnon.sela at gmail.com>>> wrote:
> >
> >     Dear Whom that Can Help,
> >
> >     I have nested numpy recarray structure to be stored into Fits.
> >     The following code is a just a test I used to build a nested
> structure (data_for_fits variable in the last line of the code).
> >
> >     Code start >>>>>>
> >
> >         import numpy as np
> >
> >         ''' The following two functions are adapted from:?
> >         adopted from https://stackoverflow.com/questions/32328889/numpy-
> structured-array-from-arbitrary-level-nested-dictionary <
> https://stackoverflow.com/questions/32328889/numpy-structured-array-from-
> arbitrary-level-nested-dictionary>
> >         '''
> >
> >         def mkdtype(d):
> >         ? ? ''' Creates dtype for nested dictionary with numpy based
> type objects
> >         ? ? '''
> >         ? ? result = []
> >         ? ? for k, v in d.items():
> >         ? ? ? ? if isinstance(v,np.ndarray):
> >         ? ? ? ? ? ? result.append((k, v.dtype, v.shape))
> >         ? ? ? ? else:
> >         ? ? ? ? ? ? result.append((k, mkdtype(v)))
> >         ? ? return np.dtype(result)
> >
> >         def dict2recarray(data, rec=None):
> >         ? ? ''' Creates numpy.recarray from data (dict)
> >         ? ? '''
> >         ? ? def _dict2recarray(data, rec):
> >         ? ? ? ? if rec.dtype.names:
> >         ? ? ? ? ? ? for n in rec.dtype.names:
> >         ? ? ? ? ? ? ? ? _dict2recarray(data[n], rec[n])
> >         ? ? ? ? else:
> >         ? ? ? ? ? ? rec[:] = data
> >         ? ? ? ? return rec
> >         ? ??
> >         ? ? dtype = mkdtype(data)
> >         ? ? if rec is None:
> >         ? ? ? ? rec = np.zeros(dtype.shape, dtype)
> >         ? ? ? ??
> >         ? ? return _dict2recarray(data, rec)
> >
> >         datan_raw = {'DATA': {'D1': np.linspace( 0, 100,
> 8*4,).reshape(8, 4),
> >         ? ? ? ? ? ? ? ? ? ? ? 'D2': np.linspace( 0, 100, 10*5,
> ).reshape(10, 5),?
> >         ? ? ? ? ? ? ? ? ? ? ? 'ND': {'D1': np.linspace( 0, 100, 10*5,
> ).reshape(10, 5),?
> >         ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'D2': np.linspace( 0, 100,
> 8*4,).reshape(8, 4), }}}
> >
> >         dtype = mkdtype(datan_raw)
> >         */data_for_fits/* = dict2recarray(datan_raw)
> >
> >
> >     >>>>>> Code ends
> >
> >     I couldn't find documentation on how to build such a FITS structure
> (nested recarrays).
> >
> >     One option is to build sub-recarrays into different BIN tables with
> a header that would correspond?to a nested key?in the recarray. But that
> would require creating another function to reconstruct the recarray
> structure after reading the BIN tables from the FITS file.
> >
> >     The better option is to build FITS is such a manner that would
> retrieve the structure correctly on FITS load().
> >
> >     Thank you for your help,
> >
> >     Best regards.
> >
> >     _______________________________________________
> >     AstroPy mailing list
> >     AstroPy at python.org <mailto:AstroPy at python.org <AstroPy at python.org>>
> >     https://mail.python.org/mailman/listinfo/astropy <
> https://mail.python.org/mailman/listinfo/astropy>
> >
> >
> >
> >
> > --
> >
> > * * * * * * * * http://www.mssl.ucl.ac.uk/~npmk/ <
> http://www.mssl.ucl.ac.uk/%7Enpmk/> * * * *
> > Dr. N.P.M. Kuin ? ? ?(n.kuin at ucl.ac.uk <mailto:n.kuin at ucl.ac.uk
> <n.kuin at ucl.ac.uk>>) ? ? ?
> > phone +44-(0)1483 (prefix) -204111 (work)
> > mobile +44(0)7908715953 <+44%207908%20715953> ?skype ID: npkuin
> > Mullard Space Science Laboratory ?? University College London ??
> > Holmbury St Mary ? Dorking ? Surrey RH5 6NT? ?U.K.
> >
> >
> > _______________________________________________
> > AstroPy mailing list
> > AstroPy at python.org
> > https://mail.python.org/mailman/listinfo/astropy
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/astropy/attachments/
> 20171204/1e4458bb/attachment.html>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at python.org
> https://mail.python.org/mailman/listinfo/astropy
>
>
> ------------------------------
>
> End of AstroPy Digest, Vol 135, Issue 2
> ***************************************
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at python.org
> https://mail.python.org/mailman/listinfo/astropy
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20171204/78ffe292/attachment-0001.html>


More information about the AstroPy mailing list