[Doc-SIG] Conversion of documentation to SGML

Fred L. Drake Fred L. Drake, Jr." <fdrake@acm.org
Thu, 14 Jan 1999 17:34:33 -0500 (EST)


  As has been promised for quite a long time now, I've been working on 
tools to convert the current LaTeX markup of the standard
documentation to SGML.  During 1998, a lot of effort went into making
the LaTeX markup more rational, and I think the effort has paid off
twofold: 1) it's easier to make intelligent markup decisions for the
current LaTeX documents, and make the HTML output more reasonable, and 
2) it's easier to parse with Python!
  In fact, the current documents are sufficiently easy to parse with
regular expressions that I've written a script that reads the LaTeX
documents and produces a stream of ESIS events similar to what's
produced by powerful SGML parsers.  With this, the documents can be
loaded into DOM tree using the XML-SIG's package.  Another script does 
this and performs a number of transformations, producing another ESIS
event stream which can then be converted to either SGML or XML.
  The conversion produces something not substantially different from
the current structure, but with a few changes.  Tables are almost
compliant with the OASIS Exchange model; the differences can be
addressed with a little more work.  There are a probably a few other
things that should be changed before freezing the structure enough to
actually write down the DTD.  The index entries are something I
haven't addressed at all, and will need a bit more work, but I'm going 
to hold off on that for now.
  My intention is that the LaTeX-->SGML conversion will be automated
at least to the 95% level, to allow further changes to the DTD until
we're ready to go production with it, as well as to ease the
transition for authors of other Python manuals and HOWTOs.
  For the moment, I'm interested in determining what structural
changes should be made to the SGML, whether there's enough information 
encoded in the markup, what allowance needs to be made for enhanced
versions of the DTD (once defined) to avoid having to actually change
the document files, and what it will take to produce formatted
output.
  In general, I'd like people who might either author documentation in 
the future or help with developing the formatting processes to take a
look at it and provide input.
  At this point, there is no process that *uses* the documents in
their SGML form; I'm interested in help on this.  I think there must
be at least a PDF generator and an HTML generator for this to even go
into a "Beta" stage (providing it on python.org, etc.).  This is my
next task, and I'd really appreciate a bit of help.  I don't know
DSSSL yet, so that'll be my next project.  If anyone would like to
work on either the print or online stylesheet, I'd really appreciate
the help.
  A package containing the SGML and updated tools to do the conversion 
(substantially updated since the recent documentation release) is
available at ftp://ftp.python.org/pub/python/doc/1.5.2/test.  Please
take a look at it and let me know what you think.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191