[SciPy-User] peer review of scientific software

Tue May 28 17:52:39 EDT 2013

On 5/28/2013 4:58 PM, Matt Newville wrote:
> Hi,
>
> As others have said, I find the low average programming skill level
> among scientists frustrating,  but I also found this article quite
> frustrating.
>
> >From my perspective, the authors main complaint seems to be that there
> is not enough independent checking of specialized scientific software
> written by scientists.  They seem particularly unhappy about the
> tendency to use existing packages written by other scientists based on
> "trust", "reputation", "previous citations" and without independent
> checking.  They also say:
>
>        A "well-respected" end-user developer will almost certainly have
> earned that respect
>        through scientific breakthroughs, perhaps not for their software
> engineering skills
>        (although agreement on what constitutes "appropriate" scientific
> software engineering
>        standards is still under debate).
>
> On this point in particular, and indeed in this whole line of
> argument, I think the authors are misguided, perhaps even to the point
> of fatality damaging their whole argument.   I believe much more
> common case is for the "well-respected" end-user developer to be known
> for the programs written and supported, and less so for the scientific
> breakthroughs (unless you count new programs as new instrumentation,
> and so, well, breakthroughs, but it's pretty clear that the authors
> are making a distinction).    It's too often the case that spending
> any significant time on such programs is career suicide, as it takes
> time and attention away from such breakthroughs.   It's perfectly
> believable that the programming skills of such a scientific developer
> may be incomplete, but I think it's fair to say that most supported
> and well-used programs are likely the effort of people with
> above-average programming skills and the interest and intent to
> support such programs.   Indeed, I would argue that instead of being
> unhappy about the reliance on trusted programs and developers, the
> authors would better serve the scientific community by arguing that
> the authors of such programs should be better supported, and given
> access to tools and resources (ie, fund them) to improve their work
> rather than treat them as untrustworthy programmers.
>
> I should admit to being one such author of a "well-respected" and
> "trust" package for a very small scientific discipline, and with the
> proverbial "many citations etc" because of this.  So I would admit to
> being the just sort of person the authors are unhappy about.  I
> suspect many people on this mailing list are in the same category.   I
> would like to think the trust and respect for certain packages have
> been earned, and that people use such packages because they are "known
> to work", both in the sense of actually having been tested on
> idealized cases, and in producing verifiable results in real cases
> (where "testing" would not always be possible).   Indeed, the small,
> decentralized group of scientific programmers that I work with (mostly
> trained as physicists, and learning to program in Fortran -- some of
> us still use mostly Fortran, in fact) do test and verify such codes,
> precisely because we know other people use them.   Of course errors
> occur, and of course testing is important.   Modern techniques like
> distributed version control and unit testing are very good tools to
> use.   I agree they should be used more thoroughly, and that one
> should always be willing to question the results of a computer
> program.
>
> Then again, when was the last time I tested the correctness of results
> from my handheld HP calculator?    Hmm, a very, very long time ago.
> That's software.  I tend to believe the messages I read in my inbox
> are actually the message sent, and hardly ever do a checksum on it.
> But that's software.  Indeed, all science is a social enterprise and
> so "trust", "reputation", and reliance on the literature (aka "past
> experience") are not merely unfortunate outcomes of laziness, but an
> important part of the process.
>
> I am certainly am happy to support the notion that "more scientists
> should be able to program better", so  I am not going to say the
> entire article is wrong, and I don't disagree with their main
> conclusions.  But I think they have a fatal flaw in their assumptions
> and arguments.
>
> --Matt Newville
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-use

Exactly!   There is actually a question here that hasn't been made 
explicit.  For whom is this advice intended?  There are all levels of 
programming/programmers in STEM.  Some of my colleagues use Excel for 
everything.  (As in, EVERYTHING.)  Some fewer use Matlab.  Still fewer 
use C/Fortran/Java/C#/whatever.  So far as I know, I'm the one lone 
Pythonista.  Each group uses programming differently.

I've been programming for more than 50 years.  I've taught programming 
to engineers in several contexts over the years.  For a time, I really 
wanted to 'do it right.'  (I even taught 'structured programming' and 
'Warnier-Orr' at one point, but realized that it was worse than useless 
for the particular audience.)  I've come to realize that most engineers 
just want an answer.  They are not interested in how gracefully the 
answer was arrived at.  MOST programs written by MOST engineers are 
small, short, simple, and intended to solve one problem one time.  (The 
deficiency I've most often seen is the lack of error checking for the 
answer, and better programming techniques would not generally help much.)

The problem is that nobody sets out to write a "well respected" 
program.  Someone sets out to scratch a particular itch ('one problem 
one time').  It expands.  Others find it useful.  It becomes widely 
used.  The original author, however, was solving his/her own particular 
problem, and was not at all interested in "proper" programming.  So, I 
guess my question is, how do we find that person who is going to write 
the "well respected" program and convince him/her to take time out and 
learn proper programming first? Because we are certainly not going to 
convince everybody to do it.

john