[AstroPy] new pyfits version deletes NP_pyfits, breaking pickle

Joe Harrington jh at physics.ucf.edu
Fri Nov 12 10:00:49 EST 2010


We use Python's OO capabilities extensively.  Our basic class has over
100 attributes, has a tree structure, and contains complex objects
within itself.  For example, it contains several FITS headers.  We
subclass it all the time, our analysis has branches for each exoplanet
eclipse we analyze (we have over 100 of these now), and stuff gets
added to each branch.  So, few of these objects even have the same
attributes.  We can't write a save/load routine for each one.

IDL and MATLAB both provide a robust save/load capability, and we now
have Python routines that can handle those formats.  I hesitate to use
them since I'm sure their object formats are different, but maybe they
capture what's needed?  Has anyone tried using them to save/restore
complex Python objects?

What we need is a general facility for saving and loading such
arbitrary objects.  While FITS (and better, HDF) might store all the
components of an object, you'd have to write something that would
disassemble and reassemble them with all their Python properties, such
as the names of the data types, the tree structure, and so forth.

That is what pickle does, and I think that its approach of importing
to get the types it uses is the obvious one.  The problems come when
the import changes, of course.

>From the pickle side, I think the only other alternative would be
something that recorded the structure and the names given to each
attribute, and any other internal properties, attempted an import, and
if they conflicted gave you what you saved and a warning that it is
out of sync with the import.  You don't want just to create the old
object and expect the user to figure it out when those objects are fed
to new software that's expecting the new object.  And saving old
*methods* could be disaster!

>From the importee's side, including a version number in your object
and checking it would let you have backward compatibility, as would
providing unpicklers and converters when objects do change.

--jh--

Thomas Robitaille on Thu, 11 Nov 2010 18:25:28 -0500:

> > Also, if you know of *any* other way to save an object, please say.
> 
> If you don't have too many different object types, and if you really
> want long> -term retrieval, then you may want to consider actually
> not using p> ickles, but for example in the case of FITS headers,
> you could re> ally just use the toTxtFile and fromTxtFile methods to
> save and > read from ASCII. I can see how pickling can be useful for
> some inst> ances, but I think FITS headers are definitely one
> example where th> ere is not much gain in using pickles over plain
> old ASCII. > You can always write your own 'pickle' module which
> would decide > how to deal with various datatypes, and use a plain
> ASCII write> /read for pyfits.Header objects.
> 
> Cheers,
> 
> Tom
> 
> > It seems pretty clear to me that to load objects from any kind of save
> > file, you have to import the classes of the object and any objects it
> > contains that are not standard Python objects.  So even if we had
> > other methods for saving, they would have the same problem as pickle.
> > But we have to be able to save objects!  Perhaps saving the definitions
> > of the types rather than importing them would be the way to go?  I bet
> > there's a long thread about this somewhere.
> > 
> > --jh--
> > 
> > On Thu, 11 Nov 2010 14:38:38 -0500, Perry Greenfield
> > <perry at stsci.edu> wrote> :
> > 
> >> We'll look into it. This is a general problem with pickles (and one  
> >> reason I've been hesitant to avoid using them like save files). I  
> >> wonder if there is a better solution than that. In this case we had to  
> >> clean out the previous numarray interface.
> >> 
> >> Perry
> >> 
> >> On Nov 10, 2010, at 7:24 PM, Joe Harrington wrote:
> >> 
> >>> My research group uses Python pickles to save data as it goes through
> >>> our pipeline (.npy and .npz do not save objects, and neither does HDF,
> >>> etc.).  These need to be loadable forever, as we often compare work to
> >>> work done much earlier.  Some of the objects we save contain pyfits
> >>> header objects.  Pickles have to import all classes used in the
> >>> pickled objects before they load, and we are getting an ImportError
> >>> about NP_pyfits.  The file NP_pyfits.py existed in stsci_python 2.8
> >>> but is gone in 2.10.  The pickles refer to this object explicitly:
> >>> 
> >>> ....
> >>> sS'photchan'
> >>> p494
> >>> I3
> >>> sS'header'
> >>> p495
> >>> (ipyfits.NP_pyfits
> >>> Header
> >>> p496
> >>> (dp497
> >>> S'_hdutype'
> >>> p498
> >>> cpyfits.NP_pyfits
> >>> PrimaryHDU
> >>> p499
> >>> sS'ascard'
> >>> p500
> >>> ccopy_reg
> >>> _reconstructor
> >>> p501
> >>> (cpyfits.NP_pyfits
> >>> CardList
> >>> p502
> >>> c__builtin__
> >>> list
> >>> p503
> >>> (lp504
> >>> g501
> >>> (cpyfits.NP_pyfits
> >>> Card
> >>> p505
> >>> c__builtin__
> >>> object
> >>> p506
> >>> NtRp507
> >>> (dp508
> >>> S'_valuestring'
> >>> p509
> >>> S'T'
> >>> ....
> >>> 
> >>> Is there any way to make our pickles readable again, other than
> >>> running the old version of pyfits forever?  Can you provide a pickle
> >>> converter that replaces the old names in the file with whatever is
> >>> new?
> >>> 
> >>> Please (everyone, not just STScI) be aware of this issue going
> >>> forward.  Pickles are the only way we know of to save objects.  You
> >>> can add things to your classes, but if you change what they import (or
> >>> otherwise break pickle), nobody can restore your class across
> >>> releases.
> >>> 
> >>> Thanks,
> >>> 
> >>> --jh--
> > _______________________________________________
> > AstroPy mailing list
> > AstroPy at scipy.org
> > http://mail.scipy.org/mailman/listinfo/astropy



More information about the AstroPy mailing list