[DOC-SIG] Re: [PSA MEMBERS] [XML] Notes on the Tutorial's markup

Paul Prescod papresco@technologist.com
Tue, 11 Nov 1997 16:40:14 -0500


Andrew Kuchling wrote:
> No markup is proposed for these features; that's been left for people
> such as Paul Prescod who actually know XML.  Probably HTML could be
> followed for these.

I agree. Or perhaps DocBook, which is a DTD designed for software
documentation and used by O'Reilly and Associates for their books.
DocBook is nice because there are already tools to convert it to HTML
and LaTeX, and of course because it makes it really easy to turn around
and hand ORA an SGML file for printing.

Now the next question is, XML or SGML. I am a long-time SGML user, but
also a member of the advisory group for XML development. Despite my
attachment to XML, it isn't clear that XML is better than Full SGML in
this context. Here are the major issues in my mind:

FULL SGML
=========
 + Minimizes typing (and "escaping") through a "tag minimization
feature"
 + (Non-Python) Tools already support it (primarily Emacs and the
commercial editors).
 + DocBook already exists
 + When XML takes over the world (in reality, not just rhetoric) we can
easily "convert to XML"
 - We depend on James Clark's C++ parsing engine "SP" to do the parsing
for us when we want to process the document using Python
 - DocBook is too complicated -- we probably want to make a subset
anyways

XML
===
 + There already exist Python parsers and these would be easy to write
if there didn't. We don't depend on anyone else (C++) to give us access
to our data.
 + We will be 100% buzz-word compliant
 - Maximizes typing (and escaping) :)
 - We must make our own DTD, or an XML-compatible variant of DocBook (or
wait for the DocBook maintainers to do so)
 
I think that when push comes to shove, whoever has to type this stuff
should vote in favour of SGML. XML means a <EMPH>lot</EMPH> of extra
typing, and SGML offers a <EMPH/lot/ of <>short cuts</>. The only real
question is whether we intend to spend a lot of time processing the
reference manual in Python and if so, whether it would be a big problem
to use a C++ program to help us. Python is a glue language after all.

Your list of features is a good start. I think that we should make a
DocBook subset that includes just those features. As we want to do more
sophisticated things, we may let it grow towards full DocBook, and
perhaps also extend it in Python-specific ways "subclass it" (in a rough
SGML-warped sense).

There also seem to be two other interesting documentation issues. 

--

The simpler one is the library reference: we may have to massively
extend DocBook for that. We also may have to do some custom programming
rather than relying on the existing tools. Neither is a big deal -- they
just mean more work for someone.

---
The more difficult issue (conceptually) is what to do about
"docstrings". 

 * Will we make them structured or leave them unstructured? 
 * Structured in 
	*SGML/XML?
	* some-adhoc language that is more "pretty"? 
	* using Python data structures?

 * How much access should the Python runtime have to that structure? 
 * How do we associate more than one "string attribute" (e.g. name vs.
description vs. see also) with each function/method/class. Maybe we need
a list of docstrings, or a dictionary.
 * How do we express emphasis, hypertext links, etc.

A related issue is whether we intend to have both a library reference
and structured docstrings? Or is the library reference just what you get
by concatenating the docstrings from the various modules? Are people
willing to make the source the documentation source too?

 Paul Prescod

_______________
DOC-SIG  - SIG for the Python Documentation Project

send messages to: doc-sig@python.org
administrivia to: doc-sig-request@python.org
_______________