[SciPy-dev] Re: Accessible SciPy (ASP) project

Joe Harrington jh at oobleck.astro.cornell.edu
Mon Nov 1 18:19:57 EST 2004


Stealing others' ideas with reckless abandon...

1. Content is king.  We need at least a few competent volunteers to
   write each document.  This will shake out differently for reference
   pages and books.  For reference pages, there are tons of docs to
   write, an average of many docs per competent volunteer, and few
   competent volunteers for any given page.  Make it hard to write
   pages and there will be few pages.

   For books and tutorials, on the other hand, there are many
   competent contributors and each individual item will be important
   enough that it will attract an editing team, not just one author.

2. On books and large tutorials, overhead becomes queen to content's
   king.  If it takes four times as long to assemble and edit a book
   in ReST than in LaTeX, it will be more efficient to convert to
   LaTeX on the editorial side even if contributors send ReST.

I'm therefore going to endorse/modify Arnd's suggestion that we go
initially with LaTeX for books and longer tutorials, and that we allow
authors to choose how to produce reference page documentation, within
some parameters.  Before everyone jumps down my throat, I'll point out
that:

3. All doc systems have both things they do very well and serious
   problems.  The problems you see depend on which callouses you've
   developed.  We now know the strengths and weaknesses of the various
   systems.  I suggest that further discussion may degenerate into
   religion unless we restrict ourselves to posting new, factual
   information about the systems rather than arguments designed simply
   to pursuade others, whose opinions are set.

In that vein, I'd like to suggest the following approach for reference
page formats: If you like a format, make an example reference page
that includes the formula for a Fourier transform or something else
non-trivial.  Also generate PDF, HTML, and a legal Docstring version
and post those.  The text should include instructions so that a
non-user can do the same (you don't have to describe the format, just
the commands to process it).  Please post the result under
Documentation Issues on the ASP wiki, and notify us of your
contribution on this list.  Please be sure the format follows the
criteria in the ASP RFC (e.g., don't post any Word files).

Once an example is posted, we can discuss whether the math looks good
enough, whether the Docstring works, whether the format is available
on all platforms, whether there are social or political technical
ramifications, etc.  Then we can choose one or more formats to
approve.  This is the approach taken in the early days of the
Internet: if you like something, make it work, then show us, then we
talk.

In that vein, I'll report on a recent effort to produce a book based
on chapter submissions from about 30 authors and something like
100-200 co-authors.  The effort used LaTeX very successfully, even
though more than half the authors submitted their stuff in Word.
While it is possible that other systems could do the job just as
easily, given editors sufficiently savvy in those systems, I doubt
that any open-source tools can come close.  Here is the story.

The book is "Jupiter: the Planet, Satellites and Magnetosphere",
edited by Bagenal, Dowling, and McKinnon and published by Cambridge
U. Press.  You can search "Bagenal" on Amazon (don't believe the
editors beyond McKinnon, Amazon or Cambridge screwed up the listing).
It has about 700 pages in 30 chapters and has an index, color plates,
etc.  I am a chapter author.  I also took care of the bibliography
format and some other LaTeX issues, though I'm no LaTeX guru.
Cambridge strongly wanted us to use LaTeX, and provided a style file
for the book.  Here's the zinger: they provided no editor!  We were
responsible for hiring one if we wanted to, or doing it on our own.
Lacking a budget, we hired the head editor's husband, who had some
experience in copy editing but was not a full-time book editor.  Now
all the costs in time and aggravation were ours, not some faceless
publisher's.  This will also be the case with ASP efforts.

The chapter authors were pretty evenly divided between LaTeX and Word,
with the edge perhaps given to Word.  After discussing the issue with
several prior Cambridge authors, we went with LaTeX, and found a
software system that converted Word to LaTeX.  It worked quite well.
Those chapter contributions came in as Word, were converted to LaTeX,
had all the appropriate cross-reference and bibliographic tags
inserted, and went back to the chapter authors as LaTeX and PDF after
reviewing and again after copy editing.  They either made edits in pen
and faxed them in, or edited the LaTeX.

I was surprised that the Word people didn't complain much about
getting LaTeX back, but the fact is that they had little trouble
making their edits.  It's easy to *edit* LaTeX if you don't know
anything about it, since it's just ASCII text.  It's harder to
*originate* it as a novice.  One of the reasons I favor LaTeX is that
it is among the friendlier formats for a non-user to edit.  Back at
the editorial office, diffs were done by hand, but wdiff would have
been a good tool (and I've used it before for this).

The reason we chose LaTeX was simple: there is no such thing as a
simple book.  To do a book well, you have to have consistent numbering
and indexing, and if you reorder chapters or remove figures, all their
numbering and therefore all the cross-references have to change.  You
don't want to do that by hand.  You have to have perfectly consistent
typesetting of figure captions, chapter titles, tables, lists,
citations, references, etc.  Figure placement rules have to be
consistent.  It needs an index and a table of contents.  While Word
can do these things, it does not make it easy if you are collecting
chapters from many sources, who will be updating and editing their
contributions while the book is coming together.  OOo is substantially
more primitive than even Word in these regards.  Yes, we struggled
with LaTeX.  Anyone writing a book using any system struggles.  But,
we saved a great deal of time with it, as the authors who had gone
before us had indicated.  The editorial team spent weeks rather than
many months on the technical composition of the manuscript.  We credit
LaTeX with imposing order on our contributors' chaos.

While many have written theses and monographs using any number of
editing systems, the collaborative nature of our documents puts us in
a completely different category of organizational difficulty.  I'm
offering this model from the experience of just having implemented it.
It's how I would do it if I were heading the editing team of one of
our docs.  It may be that someone will be able to provide a similar
formula using different tools that still fit the requirements in the
ASP RFC, but I cannot think of what that might be at this time.  I
think OOo and ReST fall far short, for books.  However, if someone can
post a similar description of a large effort to produce a book using
different open tools, I would be interested to read it (the
description, not the book!).

--jh--
Joe Harrington
326 Space Sciences Building
Cornell University
Ithaca, NY 14853-6801
(607) 254-8960 office
(607) 255-9002 fax
jh at oobleck.astro.cornell.edu




More information about the SciPy-Dev mailing list