[SciPy-dev] Re: Future directions for SciPy in light of meeting at Berkeley

Wed Mar 9 14:42:43 EST 2005

Hi folks-

Prabhu wrote:
> Anyway, Joe's post about ASP's role is spot on!

I agree with this sentiment.  I especially agree with Joe's 
prioritization, in particular, that work on his pt. 1 (resolving
the Numeric/numarray split) should be highest priority.

I don't agree that not many people know about scipy or, more generally,
Python as a tool for scientific computing.  My experience when I raise
the topic with colleagues is that they have heard of it, but when they
investigated, either they found the situation confusing (e.g., the
Numeric/numarray split; bewildering variety of platform-dependent plotting
options) or they found the software deficient (either hard to install
or buggy).  There has been great progress on the latter problems in the
last year especially, but the fact is that scipy has been around for a
*few* years, and many people have opinions of scipy/Python based on
year-old experiences.  (Also, I feel that those portraying the
installation issue as completely solved are likely not using scipy
and matplotlib across multiple platforms.)

I do not have the expertise to help with the #1 issue, and I'm sure
many scipy users feel similarly.  But as Joe emphasized, there are
other ways to contribute.  He mentioned documentation, and I agree
that's a weak spot that I suspect many folks could contribute to.

Another way is to develop scipy add-ons, but to not release them until
they are well-documented and reasonably mature (at least, not release
them as anything but beta software).  I realize this is counter to what
seems to have become the typical open source model of developing "out
in the open," releasing code with warts and all and letting it evolve
entirely in public view.  But I think something can be said for trying
to avoid making a bad first impression.  This is the path I'm persuing
myself, with work on a statistical inference package.  I have a body of
code written, but I'm not going to release it widely until it is
largely well-documented and well-exercised.  I've re-written a core
piece of it three times now, as my own experiments have revealed
inadquacies in the design.  I hate to think what the potential users
would have thought if either they discovered the problems themselves in
their own usage, or found that the next version of the code changed
interfaces in order to be more general and robust.

My point here is that "bleeding edge" users have a tendency to portray
tools as more ready for public consumption than they may actually be.
When potential new users discover obstacles that aren't present with
other tools, a bad taste is left in their mouths and they are reluctant
to come back for a second try.  They almost feel lied to---"It
shouldn't be this hard; why didn't they warn me??!!"  I'm advocating
something like "truth in advertising."  We need to accurately describe
what the user experience will be (on various platforms), and let people
know up front what obstacles they may encounter, and that the obstacles
are being addressed.  Then they can say either, "Wow, all that
capability looks worth the possible trouble, let me dive in," or "Wow,
all that capability is appealing, but I'm not up for the possible
trouble; I'll check back in 6 months."  What we want to avoid is,
"Darn, this should be as easy as it is with Matlab, but I've just
wasted 2 hours trying to figure XXX out--I'm going back to Matlab!"

Finally, I think another way to contribute to adoption of scipy is to
take seriously Claerbout's notion of "really reproducible research."
As Buckheit and Donoho have described it,

  "When we publish articles containing figures which were generated by
  computer, we also publish the complete software environment which
  generates the figures....

  "An article about computational science in a scientific publication
  is not the scholarship itself, it is merely advertising of the
  scholarship. The actual scholarship is the complete software
  development environment and the complete set of instructions which
  generated the figures."

[See http://www.stat.washington.edu/jaw/jaw.research.reproducible.html
and http://sepwww.stanford.edu/research/redoc/
for more on this notion.]

I'm sure many of you have published papers with results computed with
Python/scipy.  Like me, you probably did most of these with one-off
scripts or modules that were not written for public consumption.  What
if each of us who publishes technical results using Python were to
neaten up and document the code for just a few nontrivial published
figures, and post the results online?  Perhaps scipy.org could itself
serve as a clearing house for at least a select set of such
"reproducible research documents."  What better advocacy for scipy as a
research tool could we offer than being able to say:  "Go to scipy.org,
and you'll find scripts for 100 published results in 10 different
disciplines---and you can pick up right where they left off."

Anyway, that's one of my criteria for the public release of the package
I'm working on.  While I'll soon be releasing some of it to a group of
volunteer beta testers, I won't call public attention to it until its
main parts are well-documented and I have in place with the package a
sample of scripts that reproduce calculations in at least a few
published, peer-reviewed research papers.  This needn't be a criterion
for every scipy tool or package, but I think a few packages with this
characteristic will make a very good impression.  I bet some of you
could achieve this with your code more quickly than I can with mine!
Go to it!

Cheers,
Tom Loredo