[Doc-SIG] Documentation markup & processing

Edward D. Loper edloper@gradient.cis.upenn.edu
Mon, 04 Jun 2001 13:35:07 EDT


Hi all.

Sorry I've been pretty quiet over the last month or so..  Real life
got the better of me.  I do want to get back into actively working on
docstrings, now that the semester is over.  I'm at a conference on
computational linguistics right now, so I won't have a lot of time to
look over everything David said..  At first glance, it looks pretty
good.  The one thing that worries me somewhat is that David threw a
lot of stuff at the group all at once..  I think we should try to
address one concern at a time, to the extent possible.

In that spirit, I think David's idea of creating an overall framework,
that divides the problem into formatters, processing tools, etc., is a
very good idea.  And I think that we can informally "ratify" such a
framework without a pep, to help guide the efforts of those active on
the doc-sig.  Of course, we might then turn it into a pep, and get the
official stamp of approval, etc.  But I think that one major problem
with the doc-sig newsgroup in the past has been that we try to solve
everything at once, and don't concentrate enough on individual
problems..  

So, I think that in the near-term, we should try to focus on coming to
a consensus on what the DPS should look like.  I won't really have
time to look over that pep and its DTDs for the next few days, but
I'll try to get to it by the weekend.  Some open questions, in my
mind, are:

  - What pieces should we split the problem into?  The most obvious
    pieces are parsers and outputters.  Are there subproblems that can
    be well-defined (we would need to be able to define precise
    interfaces).

  - What should be/needs to be specified by the DPS beyond the
    interfaces?  For example, it looked like David's PEPs specified
    that the DPS should never parse private member docstrings.  But
    this might be very useful to do sometimes.  I agree that this
    would be a reasonable default, but I don't think it should be a
    requirement..  Put a different way, where do we want to draw the
    line between "API issues" and "tool issues"?  I would argue that
    *what* gets documented is a tool issue.

  - What's the best way to encode the APIs?  My first instincts were
    to use XML and DOM.  But I'm not sure that that's the best way to
    go.  The reason I say that is because I've implemented by doc
    system using DOM for some intermediate representations, and it can
    be very inconvenient.  I think that the only reason to use DOM is
    if we expect the interfaces to change, either during our
    discussions, or down the road..  Once (if?) we want to fix the
    interfaces, I think we should just make a module/package that
    encodes them with classes.  That module/package could optionally
    be capable of reading/writing XML.

  - It seems like we should be paying a lot of attention to the two
    DTDs that David has on sf, because those will place strong
    constraints on what parsers *can* do, and on what outputers *have*
    to handle.  I think that both DTDs have to be very well documented
    before we can accept them..  At the very least, we need
    definitions of the semantics of each element.  I'm not sure that
    everyone would be happy with the DTDs in their current state.

  - How stable do we intentd to make the DTDs/interface?  Once more
    than one or two tools use them, it becomes pretty expensive to
    change them -- every tool/parser must be modified.

When I get a chance, I'll try to print out the DTDs and go over them
carefully.  I would really appreciate it if David (or anyone else)
would try to add some documentation directly discussing the DTDs (or
maybe it's there and I just missed it -- I only skimmed the PEPs).  

I think that optimally, the sig should address the following issues,
roughly in order:

  1. Define exactly what the DPS does.  This needs to include what
     interfaces exist, but does not need to give full specifications
     of the interfaces.  It should include things like where
     docstrings can appear, whether we really want to require parsing
     the source file (that excludes C modules), etc.

  2. Define the interfaces, one at a time.  Currently, there's really
     just one interface: between parsers and outputters.  If we decide
     that we want more, define those too.

  2a. Agree, at least on the sig, that we like the DPS, and we intend
      to work within its framework.

  3. People can independantly work on their favorite parser, with
     active discussion on the sig about what they should look like.  

  4. People can trade their parsers, we can decide what we like or
     don't about them, and try to come to some consensus on what we
     want to keep and what we don't.

  5. Same for outputters, or any other units we define (can occur in
     parallel with 3-4).

But I think it's important to make sure that we don't get too
distracted by 3-5 while working on 1-2.  I think that getting 1-2 done
well will have a pretty big impact on whether we eventually come up
with results or not.

-Edward

p.s., David, I'll send you URLs by the end of the week, so you can
include some of my work in your peps. :)