[SciPy-dev] 2-review system on doc wiki

Sun Feb 14 14:07:59 EST 2010

On 14 February 2010 17:12, Bruce Southey <bsouthey at gmail.com> wrote:
> I just think that the 'bar' here is set too high for a volunteer
> project. Also I think that this 'new version' is asking too much
> especially when people have been working under a rather different
> approach. Also there is no conflict resolution between all the steps
> involved.

Sorry for the length here.  Hopefully this clarifies a lot of
questions.  See, in particular, example 3 if you're not convinced we
need this.

I agree with Stefan that this really isn't that complicated.  David
and I have discussed the two-review system here, in doc telecons, at
the SciPy09 conference, and in its proceedings; this is nothing new.
The motivation is simple: I read a number of the reviewed pages and
found problems that should not have passed review.  The plan is a
slight modification of our one-review plan.  David pointed to that
already (thanks, David).

There are no differences of approach other than the change in the
review system.  Since only a tiny fraction (8%) of the pages has
undergone any level of review, and only 4% have passed review, the
change will not cause a major upset to what we are doing.

As always, we resolve conflicts by discussion and use of the comment
field on each page.

We are aiming at a product of equal or greater quality to similar
manuals for software such as IDL or Matlab.  Whether this can all be
done by volunteers is an irrelevant question.  I expect that the
number of reviewers will be much smaller than the number of writers.
We will identify and vet technical and presentation reviewers, and if
necessary we can seek funds to pay them.  Of course, we'll try the
volunteer way first.  I hope that we can find volunteer technical
reviewers from among the developers.  Presentation reviewers will
likely have substantial technical writing experience; we have a list
of a few potentials already.  A professional copy editor will proof
the doc the first time we have fully-reviewed pages and hopefully for
each major release thereafter, but that's a future problem.

I give some examples and clarification on the review roles below.

EXAMPLES

1. numpy.core.umath.sqrt does not define the "out" argument (technical
omission) and uses language "branch cut", "continuous from above on
it" that will confuse the majority of readers who have not taken a
course in complex variables, such as high-school students and perhaps
many of their teachers (presentation review).  This could be solved
with an external reference, which is missing, or even just a rewording
of the sentence, like:

In the terminology of complex-variable calculus (ref), sqrt has a
branch cut [-inf, 0) and is continuous from above on it.

This is what I call "introducing an expert section".  It signifies to
our target audience (one level below the likely users of a function)
that we're about to go over their heads, where to go to come up to
speed, and otherwise not to sweat it if they don't get it.  (Actually,
in this particular case, it's not clear to me why we need to document
the analytic properties of taking roots.  There's *lots* more one
could say about roots, and trig functions, and....  We should leave
that to the textbooks.)

2. Most routines are missing pointers to relevant pages of the
numpy.doc package that discuss things like "along and axis" or "out".
In many cases, that's because these pages didn't exist when the
function docstrings were written.

3. From scipy, some of the ready-for-review pages in scipy.stats are
likely technically good, but are totally impenetrable to anyone
without several semesters' equivalent college education in statistics.
While you may need that level of description to use all the tests to
their fullest, a beginner should be able to do things like plot,
evaluate, and integrate standard PDFs within a few minutes of starting
to read the docs there.  If two stats experts wrote all the pages and
reviewed each others' writing, such improvements would never be
suggested.  Yet, a single presentation-oriented reviewer might not
catch technical errors.  That's why we need two types of reviewers.

TECHNICAL REVIEW

A technical review ensures that all the features, API points,
underlying methods that affect the results, and limitations of the
item are noted properly in the docstring.  It implies familiarity with
(or at least a good, hard look at) the source code and the general
topic (e.g., fitting, stats, etc.).  In the ideal case, an expert
should be able to take the doc and write a more-or-less equivalent
routine.  This review also should check that internal cross-references
are complete and that external references are sufficient (and
long-lived).

PRESENTATION REVIEW

A presentation review ensures that our target audience - which we long
ago defined at one level *below* that of a likely user of a given
routine - can read and understand all but the expert parts of the
document, that the doc follows the docstring format, that it is as
clear as reasonably possible, that, if expert sections are needed,
they are properly introduced as such, that the examples are the right
ones to have and that they work, etc.

--jh--