Python advocacy in scientific computation

sturlamolden sturlamolden at yahoo.no
Fri Mar 3 20:33:31 EST 2006


Michael Tobis skrev:

Being a scientist, I can tell you that your not getting it right. If
you speak computer science or business talk no scientist are going to
listen. Lets just see how you argue:

> These include: source and version control and audit trails for runs,
> build system management, test specification, deployment testing (across
> multiple platforms), post-processing analysis, run-time and
> asynchronous visualization, distributed control and ensemble
> management.

At this point, no scientist will no longer understand what the heck you
are talking about. All have stopped reading and are busy doing
experiments in the laboratory instead. Perhaps it sound good to a CS
geek, but not to a busy researcher.

Typically a scientist need to:

1. do a lot of experiments

2. analyse the data from experiments

3. run a simulation now and then

Thus, we need something that is "easy to program" and "runs fast
enough" (and by fast enough we usually mean extremely fast). The tools
of choice seems to be Fortran for the older professors (you can't teach
old dogs new tricks) and MATLAB (perhaps combined with plain C) for the
younger ones (that would e.g. be yours truly). Hiring professional
programmers are usually futile, as they don't understand the problems
we are working with. They can't solve problems they don't understand.

What you really ned to address is something very simple:


    Why is Python better a better Matlab than Matlab?


The programs we need to write typically falls into one of three
categories:

1. simulations
2. data analysis
3. experiment control and data aquisition

(that are words that scientists do know)

In addition, there are 10 things you should know about scientific
programming:

1. Time is money. Time is the only thing that a scientist cannot afford
to lose. Licensing fees for Matlab is not an issue. If we can spend
$1,000,000 on specialised equipment we can pay whatever Mathworks or
Lahey charges as well. However, time spent programming are an issue.
(As are time time spend learning a new language.)

2. We don't need fancy GUIs. GUI coding is a waste of time we don't
have. We don't care if Python have fancy GUI frameworks or not.

3. We do need fancy data plotting and graphing. We do need fancy
plotting and graphing that are easy to use - as in Matlab or S-PLUS.

4. Anything that has to do with website development or enterprise class
production quality control are crap that we don't care about.

5. Versioning control? For each program there is only one developer and
a single or a handful users.

6. The prototype is the final version. We are not making software for a
living, we are doing research.

7. "My simulation is running to slowly" is the number ONE complaint.
Speed of excecution is an issue, regardless of what computer science
folks try to tell you. That is why we spend disproportionate amount of
time learning to vectorize Matlab code.

8. "My simulation is running of of memory" is the number TWO complaint.
Matlab is notoriously known for leaking memory and fragmenting the
heap.

9. What are algorithms and data structures? Very few of us knows how to
use a datastructure more complicated than an array. That is why we like
Matlab and Fortran so much.

10. We are novice programmers. We are not passionate programmers. We
take no pride in our work. The easier hack the better. We don't care if
we are doing OOP or not. However, we do hate complicated APIs or APIs
that look funny. We are used to seeing sin(x) in our calculus textbooks
and because of that we don't find Math.Sin(x) particularly elegant --
even though Math.Sin(x) is more OOP and sin(x) clutters the global
namespace.


Now please go ahead and tell me how Python can help me become a better
scientist. And try to steer clear of the computer science buzzwords
that don't mean anyting to me.

Thanks!

Sturla Molden
(neuroscience PhD)


























> The synergies among these programming modes is in some ways harder  to
> explain than to experience. The Python novice may nevertheless observe
> that a single language can take the place of shell scripts, makefiles,
> desktop computation environments, compiled languages to build GUIs, and
> scripting languages to build web interfaces. In addition,  Python is
> useful as a wrapper for Fortran modules, facilitating the
> implementation of true test-driven design processes in Fortran models.
>
> Another Python advocacy slogan is "batteries included". The point here
> is that (in part because Python is dramatically easier to write than
> other languages) there is a very broad range of very powerful standard
> libraries that make many tasks which are difficult in other languages
> astonishingly easy in Python. For instance, drawing upon the standard
> libraries (no additional download required)  a portable webserver
> (runnable on both Microsoft and Unix-based platforms) can be
> implemented in seven lines of code. (See
> http://effbot.org/librarybook/simplehttpserver.htm ) Installation of
> pure python packages is also very easy, and installation of mixed
> language products with a Python component is generally not
> significantly harder than a comparable product with no Python
> component.
>
> Among the Python components and Python bindings of special interest to
> scientists are the elegant and powerful matplotlib plotting package,
> which began by emulating and now surpasses the plotting features of
> Matlab, SWIG, which allows for runtime interoperability with various
> languages, f2py which specifically interoperates with Fortran, NetCDF
> libraries (which cope with NetCDF files with dramatically less fuss
> than the standard C or Fortran bindings), statistics packages including
> bindings to the R language,  linear algebra packages,  various
> platform-specific and portable GUI libraries, genetic algorithms,
> optimization libraries, and bindings for high performance differential
> equation solvers (notably, using the Argonne National Laboratory
> package PetSC). An especially interesting Python trick for runtime
> visualization in models that were not designed to support it, pioneered
> by David Beazley's SWILL, embeds a web server in your model code.
>
> See especially http://starship.python.net/~hinsen/ScientificPython/ and
> http://scipy.org as good starting points to learn about scientific uses
> of Python.
> 
> mt




More information about the Python-list mailing list