OpenSource documentation problems

Fri Sep 2 02:45:35 EDT 2005

Michael Sparks <ms at cerenity.org> writes:
> > I've submitted a number of doc bugs to sourceforge and the ones that
> > are simple errors and omissions do get fixed.  
> 
> Cool. 

Better than nothing, but it's only one class of problem, and maybe the
easiest kind to report.  

There's another type of doc problem though, which is when the doc
simply doesn't explain how to use something.  It's not easy to submit
a bug report that says more than "this doc is no good".  Example: I
want to scrape a web page with urllib2 that involves logging into a
site, then going to the page I want to scrape and presenting a cookie
that came from the login page.  OK, urllib2 recently got a feature
for that, thanks to John J. Lee:

   http://www.python.org/doc/2.4.1/lib/http-cookie-processor.html

But reading that page, or the parent page for urllib2 itself, how the
heck am I supposed to use the feature?  Answering that would probably
take an hour or two of pondering the source code.  I found another way
to solve my problem instead.  And without spending that pondering
time, I can't submit a better bug report than "this doc page doesn't
explain how to use the feature".

I think PHP's online doc system is way superior for this kind of issue
since it's easy to post questions about the doc page directly to the
page, and for other people to answer.  Later, the doc editors can read
over the comments and merge the good info from them into the doc
source.

> OK, it's not the best solution in the world I'll agree, but my point was in
> general very few people like writing docs even when paid, let alone when
> not. As a result it's not that suprising IMO that if paid you end up with
> more docs. 

I think people like doing good work whether they're paid or not
(assuming it's work that they want to do at all).  Therefore, if
someone goes in with the view that software isn't good unless it's
well-documented, then good documentation will result.  So, GNU has
good docs because RMS was very emphatic about that view and got other
people involved in the project to share the view.  See for example:

http://www.gnu.org/software/emacs/manual/html_node/Bug-Criteria.html

    "If after careful rereading of the manual you still do not understand
    what the command should do, that indicates a bug in the manual, which
    you should report. The manual's job is to make everything clear to
    people who are not Emacs experts--including you. It is just as
    important to report documentation bugs as program bugs.

    "If the on-line documentation string of a function or variable
    disagrees with the manual, one of them must be wrong; that is a bug."

With Python the docs seem like much more of an afterthought.

> > I usually do report doc bugs, but my frustration (and I think Bryan's)
> > is that there are so many bugs in the first place.  It means that the
> > authors are not applying high enough quality standards to their own
> > work before releasing it.  That applies to Python's code as well as
> > its docs.  It's not crap,
> 
> Or maybe they're just doing their best? 

I don't think so.  How could you consider that example above about the
http cookie processor class to be an example of someone doing their
best?  The feature is mentioned tantalizingly but there's flat-out
zero description of how to use it.  Good documentation was simply not
a priority when that patch got added.

Another example from last week (now fixed): the shelve module doc
didn't mention the sync() operation without which there's no way to
flush updates to disc.  One thing I remember about RMS preparing Emacs
releases is whenever he did one, he'd go over the code change log
starting from the previous release, and make sure that every code
change had a corresponding doc update if it needed one.  From the
number of little doc ommissions I see like the shelve.sync operation
(maybe not that specific one), it's pretty clear to me that nobody is
doing anything like that cross-checking for Python releases.

> It might seem silly but when I do write docs, personally it takes me
> around 4-10 times longer to write the same number of lines as
> documentation than as code, whilst trying to maintain a similar
> quality. I think thats on a good day.

Well, docs have more characters per line than code, so that ratio
isn't surprising.  On the other hand, docs usually don't need
intricate design or debugging the way code does.  I could usually slog
out docs while too tired to code accurately.  Another RMS doc recipe
was: after writing each sentence, re-read it and ask yourself what
someone reading that sentence will want to know next.  That tells you
what the next sentence should be.  Following that formula til
everything was covered, printing out the result and circulating it to
a few people for review, and incorporating the comments, generally
resulted in a usable doc.

The main difference between writing code and docs that maybe bothers
programmers is that docs don't interact with you during an
edit-compile-test cycle.  You don't get the periodic squirt of
stimulation from doc writing that you get from coding a new function
and watching it do something when you test it.  It's just keystrokes
that sit there on the screen or page, and as you write more and more
of them, you feel tired instead of stimulated.  But anyone who has
gone to high school and had to write class papers has learned to deal
with this.  It's grunt work, but it's not terribly difficult, so if
you treat poor docs like any other bug that needs to be tediously
fixed, getting it done is just a matter of determination.  But first
you have to accept the notion that good docs are a necessity and poor
docs MUST be fixed.  That is what I see missing from Python.

> > I think there is an attitude problem in the central Python development
> > community, which is to expect external volunteers to do stuff with no
> > cajoling and no guidance.  That just doesn't work very well.  
> 
> I have no idea what to do about that. Giving T-Shirts to people who
> write a decent tutorial that covers an entire library module?
> (Semi-serious if there's a way of making that work)

No, what I mean is I keep hearing from Python honchos stuff like
"submit a proposal", "come up with an idea, write a PEP for it,
implement it completely and get user feedback and convince us that
it's useful and promise to maintain it for two years and THEN
python.org will consider turning it into a standard feature".  The FSF
was always willing to identify open tasks and say "the GNU project
needs X, Y, and Z; will anyone interested in working on any of them
please contact us", and it was willing to provide coordination and
guidance to people doing those tasks.  The PSF wants any contribution
to be finished and packaged (but apparently not documented) before
they even think about it.  These different approaches seem to be
deliberate policy, not happenstance, as far as I can tell, from
discussions here a few months ago.  I like the FSF's approach better.

> [FSF]
> This might sound really dumb, but I'd be genuinely interested to hear more
> about how you organised this, how you managed to motivate people to do
> these things, what "rewards" the people got for doing this. (Working on the
> assumption that rewards range from simple thanks upwards!)

I'm not sure what you mean.  The GNU project was and is tremendously
exciting and lots of people have always wanted to work on it, even now
when the FSF isn't that deeply involved in the code any more (look at
all the GNU/Linux distros and packages being done).  Wikipedia is
another example of something that's absolutely booming.  Python isn't
doing terribly badly, but I think it generates less enthusiasm than
GNU or Wikipedia because of its relative insularity and lower emphasis
on community spirit (Python's success metric is number of users while
GNU's is more about community contribution, hence the difference in
licensing schemes among other things).  I don't want to get into a
flame war about that last part since that's already been debated
endlessly, but most agree that it's a significant factor for some
people and not for others.

> > Python has been shipping Tkinter as part of its stdlib for something
> > like ten years now without any docs.  That doesn't take much of a
> > "finding" to notice.  And it's not simply a gap needing filling.  That
> > such a huge hole could exist for that long shows a systemic problem.
> 
> Maybe there is a systemic problem. Do you have a suggestion as to how to
> correct it?

The first thing to do is update the Zen list to emphasize that good
programs should have good docs ;-).  More seriously, for smaller stuff
anyway, one obvious observation is that every code patch in Python
gets reviewed by an expert before being checked in.  Part of the
review process should be making sure the docs are good.  In cases
where the docs aren't complete and no one is available to fix them and
there's a strong desire to check the code in anyway, at minimum the
missing parts of the docs should be flagged as unfinished tasks.  So
that cookie processor doc should at least get a sentence saying "this
doc needs to be finished and examples added", along with a link to
where the info is, if available (JJL's old ClientCookie class in this
example).  

If you imagine that being done everywhere in the existing doc where
stuff is missing, you'll see that a lot of stuff needs to be added.
However, in cases where it's reasonable to do so, rather than flag
docs as incomplete, the reviewers should ask the contributors to fix
the docs.  The people with commit privileges to the Python CVS are
extremely good programmers and they command a lot of respect.  So if
one of them writes to a contributor "this patch is good but its docs
are unclear; could you please explain this or that better before we
merge it", in most cases I think the contributor would be happy to fix
the doc.  So I think many of the shortcomings in the Python docs are
because there simply wasn't interaction at that level.