[SciPy-dev] Accessible SciPy (ASP) project

Joe Harrington jh at oobleck.astro.cornell.edu
Mon Oct 18 17:35:03 EDT 2004


			   Accessible SciPy

		       Request for Comments #1

			  Joseph Harrington
			   Perry Greenfield

At SciPy '04 we had an animated Birds-of-a-Feather session about
making SciPy more accessible to the general scientific user.  The
session and the many conversations that followed cleared up a lot of
issues and myths about SciPy.  Among a core group of about a dozen
participants, a consensus began to emerge about what needs to be done.
The purpose of this document is to lay out our understanding of that
consensus for your comment (whether or not you have been involved up
to now), to announce our intention to make this vision a reality, and
to solicit everyone's participation and support.  Many hands make
light work, and many brains, properly channeled, make great work.  We
thus introduce the Accessible SciPy (ASP) project.

Please take a moment to look at the BoF presentation, minutes, and
summary at http://www.scipy.org/wikis/scipy04/ConferenceSchedule.
With those documents as background, we lay out the problem, our dream,
and a path to get there.

THE PROBLEM

Python is wonderful for doing scientific computation.  Those who try
it are thrilled by the freedom from arbitrary limitations, legacy
syntaxes, and license managers.  Some of the applications and demos
are stunning.

At the same time, the experience of a new user who comes to SciPy
"cold" is awful.  There is little in the way of instructions for
installing it and running it, even to the "Hello, world!" stage, and
much of that information is outdated.  There is a lengthy description
of the build process, which most users could never follow even if they
wanted to.  It is not even clear what components are in SciPy, what
else is needed, and what the add-ons do.  It is to the millions of
such potential users (which includes every high-school science student
in the world) that the current effort is directed.

We need to say at this point that we are not picking on Enthought.  In
fact, what's on their servers works a great deal better than they
advertise.  One of us was able to do a "yum install SciPy" on a laptop
at the conference.  But, the word "yum" appears nowhere on the web
site, nor does the URL for the repository.  We need to take the
resources being made available by Enthought and create from them the
user experience SciPy needs to thrive.

WHAT'S EXPECTED BY NEW USERS

New users who are not "bit-heads" expect a first-time experience that
goes something like this:

1. They hear about SciPy and are somehow induced to find the site and
   visit.

2. They see a brief front page that says what SciPy is, lists some
   features, and points to a page with more info.  There is a
   reasonable number of relevant links, all clearly laid out on a page
   that is not too busy.  Words on the front page invite them to
   download and try the software.

3. They click "download", and go to a friendly page that offers a
   binary install package for the current version of the software for
   their machine (and others, but they don't care about that).  It
   also offers instructions for installing the package on their
   platform.  For platforms like Red Hat and Debian Linux, the
   instructions say what lines to add to yum.conf or its Debian
   equivalent, though the option to download the binary package
   directly also exists.  The instructions end with:
	how to configure (if necessary, hopefully not),
	how to test the install,
	where to go for help if the install failed,
	a pointer to a "Getting Started With SciPy" document.
   They follow the instructions and it installs fine.

4. They read the "Getting Started" document, which is about 20 pages
   and contains:
   a. A brief (1/2 page) description of SciPy's key characteristics
      from a new user's point of view
   b. A get-your-feet-wet tutorial, for example:
	make a sine wave and a parabola, add them, and plot,
	read some ASCII data and plot it,
	do a Fourier transform of the data and plot it,
	read an image and display it,
	make a second image, add it to the first, and display it, and
	extract an image section, manipulate it, and display it.
   c. The basic (non-programming) Python syntax, including
      array creation, access, and manipulation.
   d. The key elements of Python's interaction with the operating
      system (e.g., the most important environment variables, how to
      tell it where routines are stored, etc.).
   e. A statement on the current state of the software, that there is
      flux in graphics and the underlying array package, but that the
      functionality we teach in the docs will not (likely) go away.
   f. Sources of further information and support, both in the
      downloaded package and on the web.

5. Encouraged by the smoothness of the experience so far, the user
   dives right into some topical docs (for astronomy, biology, etc.),
   buys a book on Python (or even SciPy), subscribes to scipy-user and
   a topical list, downloads add-ons, and becomes productive almost
   immediately.

OUR SOLUTION TO THE CONUNDRUM OF OPEN DEVELOPMENT

SciPy both suffers and benefits from being Open Source Software.  On
the one hand, many developers work on it and it is quite powerful.  On
the other, there is no central organization, so packages can conflict,
there is duplicated work, there are holes, etc.

ASP proposes to provide the organization that is lacking.  Through a
community-driven process, we will solicit help in resolving the issues
and in implementing the solutions.  We will guide new users toward
what the community feels are the best packages for each basic task, so
that new users experience a unified software package with clear
documentation.  At the same time, we will continue to provide the
broadest array of application software possible, and will index all
the available packages and functionality so people can make informed
choices if they decide the defaults are not what they need.

Mainly, we will guide users through selective documentation.  For
example, once we select a more capable graphics package that supports
all platforms, the general user documentation will teach that package
rather than xplt or gplt.  The latter would continue to exist, would
receive whatever support their developers and fans chose to give them,
and would be listed along with all the other graphics options.  They
would merely not be emphasized in the tutorial documentation.  Thus,
nobody will lose functionality or choice.

It is possible that, in the future, another package may supplant one
that the documentation promotes.  In that case, after due
deliberation, we may decide to rewrite the documents to reflect that
change.  Of course, such a change would be undertaken only in
compelling circumstances.

WHAT'S NEEDED TO GET THERE

At the BoF, we identified three key areas that needed to be worked on
in order to realize the experience above: packaging, the web site, and
documentation.  We also identified two situations, graphics and the
numeric/Numarray split, that will confuse (and therefore discourage)
new users until they are resolved.  We will address these first, as
they frame the packaging issues.

SOFTWARE ISSUES

On the graphics front, there are two included packages, gplt and xplt.
One works best on Unix, the other on Windows.  Neither is much to
write home about, and all this is acknowledged.  Considerable effort
has gone into Chaco, but the package has not matured as fast as some
had hoped, and it is still not ready (according to most).  A new
player, matplotlib, is gaining popularity.  With the exception of
contour plots, which are being actively worked on, it seems to
implement the 2D plotting capability of Matlab, with both a similar
user interface to Matlab and a more capable Python-like class library.
It is interactive (you can click on a screen plot and adjust it in
real time).  It works on all platforms.  The BoF consensus was to base
the new documentation on matplotlib and to de-emphasize gplt and xplt
by not mentioning them in the docs at all.  They would remain in SciPy
indefinitely.  Supporters of Chaco may advocate for the selection of
that package instead.  Whether the community goes for this idea will
depend in large part on the state of Chaco (including package
documentation and maintainability by people other than the original
authors) when this decision is made.

On the numeric/Numarray front, it is a problem for some that Numarray
performance is worse for small arrays than that of the older numeric.
Other issues exist as well.  Perry Greenfield is working on getting
the two packages to co-exist such that which one to use can become a
run-time choice of the user, and so that package maintainers can
independently choose to use either one.  It is hoped that ultimately
Numarray will be superior to numeric in all respects, so that numeric
can be deprecated and then later removed.  Thus, the BoF consensus was
to document the use of Numarray for interactive work.

For now, Numarray and matplotlib would not be in the SciPy core.
Numarray will enter the core when the coexistence issues are resolved
and the developers accept it.  matplotlib may enter the core when its
release schedule slows down, but at this point matplotlib releases
much more frequently than SciPy and there are substantial improvements
with each release.  If matplotlib were in the core, users would be
stuck with an incomplete and beta package until the next release of
SciPy, which could be many months away.

PACKAGING

New users generally will not build software from source code anymore.
The packaging goal is thus to produce binary packages for Linux under
both RPM and the Debian Package Manager, and for Mac, Windows, and Sun
using their native package managers.  Support for each of these (and
for other platforms) depends on interested users participating and
contributing good packages on a timely basis.  We appreciate any help
the software developers can provide in producing the packages, but
we expect that most of the packaging work will be done by volunteers
who specialize in this task.

The BoF consensus was to distribute SciPy, Numarray, and matplotlib as
separate binary packages, possibly tied together by a meta-package
that contained only dependencies (for YUM and APT users).  In
addition, the conference was sufficiently impressed with the
interactive experience offered by IPython that it would be included
until it entered the SciPy core.  That move was discussed positively
at the conference.  Optimized matrix routines may be included, or may
be available as an option, depending on our ability both to produce
them in a timely fashion and to provide a simple new-user experience.

DOCUMENTATION

Documentation is undoubtedly the area where the largest number of work
hours remain to be done.  It needs rewriting, or writing for the first
time, at all levels visible to users.  Because of this and the beta
nature of some of the packages, it makes sense to focus on shorter
documents that address specific tasks, rather than something like a
paper book (though that might make sense two to three years from now).
It is important for docs to be well-written: clear, concise,
grammatical, current, complete, correct, and visible.  Thus, it makes
sense to coordinate this effort closely between the writers, relevant
developers, and those planning the overall effort.  The approach of
writing shorter docs will enable participation by a larger group of
writers, thus (we hope) completing the first draft of each document
sooner.  It will also make updates and rewrites easier.  Of course, it
will be important to provide good cross-references to other documents
and the web site, and to keep these up to date.  Needed documentss:

What is SciPy? (listing of features and included packages)
Installing and Testing SciPy on Red Hat Linux
				Debian Gnu-Linux
				MAC
				Windows
				Sun
Getting Started with SciPy
SciPy for IDL Users, with cheat sheet
SciPy for Matlab Users, with cheat sheet
SciPy User's Guide
SciPy Reference Manual
SciPy for Astronomy, Biology, Chemistry, etc.

It was noted at the BoF that Enthought has substantial documentation
in its Windows package in .chm format, and that there is now an X
reader for these files.  Those documents will be broken out for
non-Windows users and may form the bases for one or more of the
documents above, assuming cooperation from Enthought on permissions
and copyrights.

Translations are always a good idea, but can only be obtained in a
community project through the work of competent volunteers.  The
quality of our documentation will be vastly improved by user testing
and comment, and by the participation of more than one person in the
creation of each document.  However, content determines form, so we
should not be shy about making first drafts available, provided that
we continue to work on them.

We will need to agree on a set of standards for writing documents.
Certainly all docs should be available in PDF and HTML, and the source
should be modifiable by anyone on any platform.  The Windows .chm
format may also be a viable target if good tools exist to do it under
Linux and Mac as well as Windows.  The source form must be viable in
the long term and must not leave us in a bind if the text-processing
software ceases to be supported.  Structured Text and LaTeX have been
suggested.  MSWord and other proprietary forms are out.  The major
need for formulae in scientific documentation may push us to LaTeX,
perhaps with a standardized format, but this is open for discussion.

WEB SITE

Probably the easiest step is improving the web site.  Enthought has
provided a Plone-based site, which means that updates by many
contributors will be easy and many features are already there.  Much
of the content needs updating, and many new areas are needed.  For
example:

- The site says that unit tests are the main need from the community.
  Our needs are broader, as evidenced by the BoF conclusions.

- The documentation above needs to be placed on the site as it becomes
  available.

- Pages and mailing lists are needed for topical discussions, each
  managed by a field representative.

- The YUM archive and other download sites need to be linked to the
  main pages.

- The utilitarian, Plone-out-of-the-box appearance should be replaced
  by one with more legible buttons and some lightweight graphics that
  make the page attractive.

- A page with demos and screen shots should be added.

- Pages that present and possibly rate add-on packages are needed.

- Pages of user recipes and web links are needed.

- A page of demos would be nice.

- Throughout, it needs to be clear who is responsible for each page
  and how to reach that person.

- Overall, the site needs to be reorganized to give new users the
  experience outlined above, without making it much less useful for
  experienced users to navigate.

Essentially, scipy.org needs to become a one-stop-shopping web portal
for scientific computing in Python.  By providing current, complete,
and accessible information in an attractive format, the web site will
show potential users that this software is serious business, that it
will not be difficult to learn, and that it is actively maintained by
a large user community.

PRIORITIES

The three areas for improvement can proceed in parallel.  Since
packaging issues affect the documentation, those should be resolved
as soon as possible.

For packaging, we need to:

1. determine whether the idea of replacing xplt and gplt with
matplotlib in the docs is acceptable to the community,

2. determine whether IPython should go into the core,

3. determine whether the "coexistence" approach for resolving the
numeric/Numarray split is a good idea.

Then, we need to roll and field-test the binary packages.  The latter
effort will require volunteers, so if you want your package or
platform to be supported, please sign up.

For documentation, we need to agree on a common source format, an open
copyright, and some stylistic guidelines.  Documents are needed in
roughly the order listed above, though if someone is very inspired to
start on a later one, he or she should proceed.  Suggestions for
additional documents are welcome, as are volunteers to write them.

Goals for the web site are less clear than for packaging and
documentation.  Some mailing list brainstorming on content and format
issues would be a good idea, starting with those above.  Then, we need
to identify page maintainers and get them access.  Several individuals
stepped forward at the BoF, and a web human interface designer has
volunteered to help.  Again, more volunteers are needed.

TOPICAL SOFTWARE

As a parallel effort, there is now critical mass in several
disciplines for the creation of topic-specific software.  For example,
in astronomy, most data come in Flexible Image Transport System (FITS)
format, and there is a package written by Paul Barrett that reads and
writes FITS files in Python.  We wish to encourage these
contributions, to provide guidance for a seamless integration with the
main packages, and to help new users find these resources.  We can
start by hosting topical mailing lists at scipy.org and by providing a
variety of web resources, including package or topic home pages, lists
of packages by topic, and lists of links to packages hosted elsewhere.

ADVOCACY

Once the basics are in place, we can start advocating SciPy to the
broader scientific community.  With luck and a little effort, this
should expand the group of contributors so that the remaining work can
be done more quickly.  Suggestions from the BoF included a conference
kit so that someone can set up a booth that advocates SciPy simply by
printing some posters and handouts and setting up a computer.  Others
included lunchtime talks in departments, killer demos, comparison web
pages, and the best advocacy of all, being a good example to peers.
The BoF identified the need for a user survey.  Individuals can best
judge when and to whom they advocate SciPy.  However, it would make
sense to coordinate any major presentations with the SciPy community,
to ensure good synchronization with new releases, web site work, etc.

CONCLUSION

SciPy is a community project, and the community should coordinate and
review what goes into SciPy.  In that vein, this document is a genuine
request for your comments, and we will hope to maintain this format in
the future so that all voices can be heard.  If you are willing to
work on any part of this effort, please make your intentions known by
adding your name and email (spam-obscured, if you like) to the ASP
Wiki at http://www.scipy.org/wikis/accessible_scipy/.  Since this is a
community project, please be prepared to accept edits, corrections,
and updates from the community, and to ask for help or to hand off
your work to someone else when you are no longer able or willing to
maintain it at the level or rate that is needed.

The vision above is a large one.  However, none of the steps is
particularly large.  Many projects can proceed in parallel and we gain
from the completion of each task.  Discussion will continue, for now,
on the scipy-dev at scipy.org mailing list, with major announcements
copied to scipy-user.  If this topic comes to dominate the list, it
may be split off into its own list, but for now we want to keep all
those working on SciPy apprised of our efforts.  We invite you to join
us in making SciPy the environment of choice for technical
computation, starting with your comments on the plan outlined above.

--jh--




More information about the SciPy-Dev mailing list