[SciPy-dev] Online SciPy Journal

Tue Oct 3 02:59:33 EDT 2006

On 9/29/06, Travis Oliphant <oliphant at ee.byu.edu> wrote:
> As most of you know, I'm at an academic institution and have a strong
> belief that open source should mesh better with the academic world then
> it currently does.  One of the problems, is that it is not easy to just
> point to open source software as a "publication" which is one of the
> areas that most tenure-granting boards look at when deciding on a
> candidate.
>
> To help with that a little-bit,  I'd really like to get a peer-reviewed
> on-line journal for code contributed to SciPy started.  Not to implicate
> him if he no longer thinks it's a good idea, but I first started
> thinking seriously about this at SciPy 2006 when Fernando Perez
> mentioned he had seen this model with a math-related software package
> and thought we could try something similar.

[...]

I think this is an important and wortwhile idea, and I'm willing to help.
However, I think it's worth clarifying the intent of this effort to make sure
it really does something useful.

The GAP process (I can confirm that what Robert mentioned is what I had in
mind when I spoke with Travis, we can request further details from Steve
Linton at some point if we deem it necessary) addresses specifically the
issue of code contributions to GAP as peer-reviewed packages, without going
'all the way' into the creation of a Journal, which is an expensive
proposition (at the very least from a time and effort perspective).

There are basically, I think, two issues at play here:

1. How to ensure that time spent on developing open source packages which
are truly useful is acknowledged by 'traditional' academia, for things like
hiring and tenure reviews, promotions, grant applications (this means that
any changes we target need to make their way into the funding agencies), etc.

2. The question about having a Journal, covering modern scientific computing
development, practices and algorithms, and with a more or less narrow Python
focus.

I am convinced that #1 is critically important: I believe strongly that the
open source model produces /better/ tools, more flexibly, and in a vastly more
pleasant fashion, than mailing a big check every year to your favorite vendor.
But if many of us want a professional research future that involves these
ideas, we're going to need them to be acknowledged by the proverbial bean
counters.  Else, committing significant amounts of time to open source
developments will look an awful lot like professional suicide.

However, I am not yet convinced that #2 is the way to achieve #1.  It may
be a worthy goal in and of itself, and that's certainly a valid question to
be discussed.  But I think it's really worth sorting out whether #2 is the
only, or even a good way, to accomplish #1.  The editing and publication of
a journal requires a vast amount of work, and for a journal to be really
worth anything towards goal #1, it /must/ achieve a certain level of
respectability that takes a lot of work and time.

The GAP developers seem to have found that a clearly defined process of review
is enough to satisfy #1 without having to create a journal.  It may be worth
at least considering this option before going full bore with a journal idea.

If, however, a Journal is deemed necessary, then we need to make sure that the
ideas behind it give it a solid chance of long-term success as a genuine
contribution to the scientific computing literature.  We all know there's
already way too many obscure journals nobody reads, we shouldn't be
contributing to that.

I'm listing below a few references to various topics and ideas I've noted over
time on this topic, hoping they may be of value for guiding this discussion:

- One of the central points of the open source model is that it provides for
true reproducibility of computational results, since users can rebuild
everything down to the operating system itself if need be.  The big mantra of
reproducible research has for a long time been championed by Stanford's Jon
Claerbout, and there is a very famous paper by Donoho and Buckheit which
summarizes this in the context of Wavelab, a Matlab-based wavelet toolkit.
These are the basic references:

http://sepwww.stanford.edu/research/redoc/
http://www-stat.stanford.edu/~donoho/Reports/1995/wavelab.pdf

I think it would be great if a new journal would emphasize these ideas as a
founding principle, by making full reproducibility (which obviously requires
access to source code and data) a condition for publication.  I am convinced
that this alone would drastically improve the S/N ratio of the journal, by
eliminating all the plotting-by-photoshop papers from the get go.

- There is an interesting note in a recent LWN issue:

http://lwn.net/Articles/199427/

if you scroll down to the section titled "WOS4: Quality management in free
content", you'll find a description of the journal Atmospheric Chemistry and
Physics (http://www.copernicus.org/EGU/acp/acp.html).  They use an interesting
combination of traditional peer-review and a 'publish early, discuss often'
approach which apparently has produced good results.  Quoting from the above:

    When a paper is submitted, as long as it's not complete junk, it will be
    immediately published as a "discussion paper" on the journal's web
    site. It is clearly marked as an unreviewed paper, not to be taken as
    definitive results at that time. While the referees are reviewing the
    paper, others can post comments and questions as well. These others are
    limited to "registered scientists," since the desire is to keep the
    conversation at a high level. The comments become part of the permanent
    record stored with the paper, and they can, at times, be cited by others
    in their own right. The editor will consider outside comments when
    deciding whether the paper is to be accepted and what revisions are to be
    required.

    After using this process for five years, Atmospheric Chemistry and Physics
    has the highest level of citations in the field. Citations are important
    in the scientific world: they are an indication that a given set of
    research results has helped and inspired discoveries elsewhere. The high
    level of citations here indicates that this publication process is
    succeeding in attracting high-level papers and filtering out the less
    useful submissions.

- It's always worth having a look at the PLOS process (http://www.plos.org),
which has been for a few years trying to change the publication model in the
biomedical community towards a more open one.  I'm not sure what actual impact
their journals currently have, though.

- In the high energy physics community, for a long time the de facto mechanism
for real communication has been the arXiv (http://arxiv.org).  I think it's
fair to say that in several subfields of HEP, people push for publication in
'real journals' mostly for career reasons, but by the time a paper makes its
way to Physical Review or Nuclear Physics, it has long ago been digested,
discussed and commented, possibly in a round of short comment papers at the
arXiv itself.  While the arXiv is NOT peer-reviewed and hence the need for
'real' publications for bean-counting purposes remains, the crackpots and
other poor-quality work never seem to be a major problem: people tend to just
silently ignore them, while the good work is quickly picked up and generates
response.  Over time, the arXiv has developed subfields beyond HEP, I have no
idea how successful these have been in their respective communities.

- I also think we should target a higher, long-term goal: improving the
standards of software development in scientific work, by 'showing the way'
with an emphasis on documentation, testing (unit and otherwise), clean APIs,
etc.  Hopefully this will be a beneficial side effect of this effort, whether
done via a journal or some other process.

In any case, I'm very interested in this and I'm willing to help, and I didn't
mean to undermine the enthusiasm displayed so far.

Regards,

f