[AstroPy] Python version of IDL Astron library

Joe Harrington jh at oobleck.astro.cornell.edu
Wed Sep 8 10:49:49 EDT 2004


Hi all,

A Python version of the IDLastro library is a great idea, and crucial
for the success of Python in the astronomy community.  The main issues
raised so far:

underlying packages (numeric vs. Numarray, plotting)
Python, C, C++, FORTRAN?
closeness of implementation to IDL library

to which I'll add (interspersed):

documentation
code conversion
coordination

UNDERLYING PACKAGES

There isn't much of a standard in the IDL library, except for
consistently following the IDL doc header format.  However, Python's
richness and the fact that this won't be a first implementation by a
few people opens us up to greater risk than the IDL developers faced.
I think that we should develop a set of standards.  At SciPy '04, we
came up with the following:

1. Numarray for all new code, and to teach to new users.  New docs
   teach only Numarray.  Soon Numarray vs. numeric will be a run-time
   choice, and hopefully in a year or two there won't be a compelling
   reason to use numeric over Numarray.  Right now the issues are
   mainly performance issues for small arrays (less than 1000
   elements), which matter more to mathematicians and biologists than
   to astronomers.

2. Matplotlib for all plotting.  Try it.  You'll like it!  It has a
   matlab-like interface, but that's just a front end to the full OO
   plotting functionality.  You don't have to learn the matlab-like
   interface if you don't want to.  It's interactive so you can adjust
   your axis scaling with the mouse, etc.

3. Ipython.  This is a wonderful and 100% compatible improvement to
   the interactive parser.  It doesn't affect programming at all, so
   it's not very relevant to the current topic.

4. All of the above to be provided in binary packages as well as
   source packages and tarballs.  Each currently-popular platform must
   be supported by its native packager:
	Fedora		RPM/YUM
	Debian		Debian Package Manager/APT
	Sun		tar+checkinstall?  or is there something better now?
	PC		I forget the name, but there is one now
	MAC		I forget the name, but there is one now

5. A community effort to (re)write all needed documentation.

6. A community effort to create an excellent web site at scipy.org,
   with (links to) discipline-specific pages that collect, organize,
   rate, and distribute topical application software.  The goal here
   is to proactively take the user to where he or she needs to go, not
   just to throw a lot of information around and baffle people.

I have sent Perry a draft of a foundational document for the project,
which I will post here for comment when we've agreed on it among
ourselves.  However, the basic points above were consensus (or nearly
so) at the conference.

PYTHON/C/C++/FORTRAN

Here's how this affects AstroPy (or PyAstro, or whatever Paul wants to
call it).  First, the choice of what underlying code to use (pure
Python, C, C++, FORTRAN) should be the developer's, so long as that
person can get it working in scipy_distutils on all platforms.  I was
told at the conference that Python, C, and C++ are all easy to get
working, but that FORTRAN is problematic.  However, there was a
presentation at the conference of a new build utility (it doesn't have
a name, though I jokingly called it "makeover") that claims to have
cataloged all the options for every FORTRAN compiler in existence,
including Borland's older ones.  So, that may become the new standard,
or that code may be borrowed by distutils.  Stay tuned on that one.

There are strong benefits to writing in a compiled language, the main
one being portability to later interactive languages.  Python will not
last forever.  A good run for an interpreted language is 15 years.  In
contrast, FORTRAN and C are 2-3 times as old and going strong.  Python
may grow and last longer, or may not.  When we move on to the next
great thing, we will again go through an agonizing process of
rewriting and conversion.  The stuff that gets rewritten first will be
the stuff that's just wrappers around C, C++, and FORTRAN.  Wouldn't
it be great if that's nearly everything?  If you look at SciPy now,
the majority of it is wrapped compiled libraries that implement
practically all of the basic numerical routines.  Imagine where we'd
be if we had to *write* that stuff in Python, rather than wrap it.
Imagine how hard the next switch in interactive languages would be if
we did that.

There is also a strong argument against rewriting code that already
works.  We have a lot of work to do, so if you have working compiled
code, please just wrap it.  If you are writing new code, it might make
sense for large, monolithic algorithms to be written in C or C++ and
to be wrapped for Python.  Smaller routines (the majority, by number
at least) are probably best done in Python directly.  These won't be
hard to redo in the next language.  In any case, if you don't want to
learn distutils and won't be providing distutils-building code, please
stick to Python, or hook up with a distutils hacker.

DOCS

Documentation is a strong suit of IDL's and we need to be as good.  I
am not aware of a standard doc header for Python.  If there is one, we
should use it.  If not, I suggest we essentially copy IDL's, with some
modifications to rationalize it and to make it work for Python.  We
also need a code that will extract the doc pages from a package and
will collect the docs, turn them into HTML and PDF, and put them in a
searchable database.  This code will be central to SciPy as a whole so
upstream coordination with the soon-to-be-born doc effort would be
great.

The source language of docs is another issue.  It has to be open and
functional on all platforms.  It has to handle simple markup and
inline figures.  It has to produce PDF and HTML.  MSWord is out.
Research Structured Text or LaTeX are my votes.  Some like LyX and
TeXmacs.  We'll see what the community wants.

SYNTACTIC CLOSENESS

Syntactic closeness to the IDL library would be nice but isn't crucial
from my perspective.  Where things are reasonable, keep them.  Where
they are not (what does "sxpar" mean, and to whom?), make them so.  We
could provide a compatibility wrapper for fans of sxpar.

CODE CONVERTER

One idea that has been tossed around is a code converter.  There are
two approaches to this, which I call the 80% solution and the 100%
solution.  The 80% solution converts all the procedural code on a
line-by-line basis.  It would do the gruntwork, and would ensure that
array access, etc., is converted correctly (otherwise subtle bugs will
creep in).  The developer would still have some gruntwork to do to get
it to actually work, but not a lot, particularly if the code was
simple and didn't depend on a lot of IDL library routines.  The 100%
solution would be 100 times harder to write.  It would convert all the
code, including the OO parts, and would also implement the IDL
library, generally in terms of a set of wrapper routines that called
SciPy routines.  It would actually be a re-implementation of IDL.  I
claim this is unnecessary and not even very desirable.  We want people
to switch to Python and to contribute new code to the community.  I
have no personal interest in providing them a free IDL.

Anyway, if we had a converter, this entire project, as well as the
project of getting astronomers to switch, would be *much* easier!  It
would be safest from a legal point of view for the converter authors
NEVER to have run IDL nor to have read RSI's manuals.  The IDL license
prohibits reverse-engineering, and most of us have agreed to it.  I
don't believe it's reverse engineering to write a code converter from
a language that you know into another language that you know that has
the same capability.  However, I'm not a lawyer, and I don't have the
means to fight that battle in court.  IDL is simple enough that
someone who doesn't know it can get a commercial book on IDL, such as
Gumley's Practical IDL Programming, that provides all the information
that's needed.  Any of us experienced IDL users can then publish a FAQ
on the web that answers any questions the programmers might have,
without our actually writing or looking at any code.  This should keep
the whole effort legally above reproach.

COORDINATION

Finally, to address coordination issues, I think community testing and
review is key.  Let's ask people to post their development plans in
advance for community design review and to update a public-read CVS
often.  Let's have public review and testing of all new code before
accepting it into the library.  That way we'll address
interoperability issues and catch bugs early.  If we're a successful
community project, Paul's greatest contribution will be the
coordination of the effort and the moderation of the review process,
rather than the bits of code he produces himself.

As always, I welcome your comments.

--jh--
Joe Harrington
326 Space Sciences Building
Cornell University
Ithaca, NY 14853-6801
(607) 254-8960 office
(607) 255-9002 fax
jhmail at oobleck.astro.cornell.edu
_________________________________________________
AstroPy mailing list     -      astropy at stsci.edu
http://www.astro.washington.edu/owen/AstroPy.html


More information about the AstroPy mailing list