From Moshe Zadka Fri Aug 20 20:59:24 1999 From: Moshe Zadka (Moshe Zadka) Date: Fri, 20 Aug 1999 22:59:24 +0300 (GMT+0300) Subject: [Doc-SIG] Python PODs? Message-ID: Hi. The recent flamewar on c.l.py about lack of man pages got me thinking about a new "backend" for Python documentation, which would reflect Python's nature, while being amenable for auto-processing. The basic format would be POD (Perl's plain old documentation format), which is quite easy to parse. The Pythonic namespace hierarchy (package->module->object->methods) could be reflected in directory structure I.e., urllib/ ---.pod (general information about the module) urlopen.pod URLOpener/ ---.pod (general information about the class) __init__.pod (description of init method) Then, use of pod2man/pod2text with appropriate wrapping would mean I could have pythondoc urllib (show urllib/---.pod) pythondoc urllib.URLOpener.__init__ (show urllib/URLOpener/__init__.pod) I would be all-to-happy in helping to write such a back-end, we just need to make sure 1. The front-end is stable, and easily parsable (SGML ain't, XML is) 2. The front-end has enough information to do that easily (not really sure) What do you think? A preliminary module to make genereating PODS look just like generating XML (so the convertor would be written the way XML->XML translators are written) is in http://starship.python.net/crew/moshez/POD.py (It's not thoroughly debugged, but it passed the simple _test() well enough) -- Moshe Zadka . I'm not anti-Microsoft -- Microsoft is anti-me (Anonymous Coward on /.) From skip@mojam.com (Skip Montanaro) Fri Aug 20 21:21:03 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 20 Aug 1999 15:21:03 -0500 (CDT) Subject: [Doc-SIG] Python PODs? In-Reply-To: References: Message-ID: <14269.47037.766307.636936@dolphin.mojam.com> Moshe> The basic format would be POD (Perl's plain old documentation Moshe> format), which is quite easy to parse. The Pythonic namespace Moshe> hierarchy (package->module->object->methods) could be reflected Moshe> in directory structure ... Moshe> I would be all-to-happy in helping to write such a back-end, ... ... Moshe> What do you think? In theory, it's a reasonable proposal, however I worry about having yet another place to write documentation. For module docs we currently have the library ref manual and the inline doc strings. For your proposal to fly, not only would you have to be a strong motivating force in getting it implemented, it would have to replace one of the other current documentation formats somehow. Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 From Moshe Zadka Sat Aug 21 04:48:13 1999 From: Moshe Zadka (Moshe Zadka) Date: Sat, 21 Aug 1999 06:48:13 +0300 (GMT+0300) Subject: [Doc-SIG] Python PODs? In-Reply-To: <14269.47037.766307.636936@dolphin.mojam.com> Message-ID: On Fri, 20 Aug 1999, Skip Montanaro wrote: > In theory, it's a reasonable proposal, however I worry about having yet > another place to write documentation. For module docs we currently have the > library ref manual and the inline doc strings. For your proposal to fly, > not only would you have to be a strong motivating force in getting it > implemented, it would have to replace one of the other current documentation > formats somehow. Reread the proposal. I'm talking about automatic conversion from the nativ Python format. My proposal is horrible for native writing of documentation, just like info. -- Moshe Zadka . I'm not anti-Microsoft -- Microsoft is anti-me (Anonymous Coward on /.) From Fred L. Drake, Jr." Just as a discussion item, I've tossed out a copy of the XML produced by tools currently checked into CVS (Doc/tools/sgmlconv/). The file is: ftp://ftp.python.org/pub/python/doc/1.5.2p1/test/xml-1.5.2p1+.tgz That might be easier than pulling the stuff out of CVS and generating it, especially if you have a slow connection or a slow machine. ;-) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Fred L. Drake, Jr." Last week I promised on the Python list to describe the current status of the conversion to SGML/XML. Here it is! I'm currently able to parse all the LaTeX markup and generate either XML or SGML. The structure of the output is very similar to the input structure, but a number of minor improvements are made. The improvements are mostly very localized and have more to do with keeping the markup from getting to complex and disjointed, and eliminating some bogosities. I am not at all decided on a DTD to use. I see three options: 1. DocBook -- this has been developed and heavily use-tested by a number of corporate users, and is known to be good for technical documentation. There are tools and stylesheets available to convert from DocBook to HTML and printed formats. We'd probably need to specialize it, but it's designed for that. Konrad Hinsen has already developed one customization that he's using to document Python modules, and there's an initiative to create a common extension for documenting OO constructs. I've asked Konrad for some sample documentation so I can see how it actually works out. My concern with DocBook is that the markup may be a bit on the "heavy" side; I don't want the document source to be so markup-heavy that I'm the only one to work on them. 2. Create something similar to what we had in LaTeX, but with fewer warts. This is appealing because the conversion would be done sooner. However, new stylesheets would be needed, slowing down the usefulness of the result. It would also be the easiest to adopt for people already familiar with the current markup. 3. Create something entirely new and specific to Python. Clearly, this offers a lot of work to all the volunteers. We'd need requirements analysis, DTD design, stylesheets, and probably lots of things I haven't thought of. However, it also means we can limit the weight of the markup in the source, which might be a major advantage in getting people to use it. But *everyone* would have to learn it (well, everyone that writes documentation at any rate). This offers a great deal of opportunity to "get it right" for Python, but also a lot of rope. (You know what rope is used for, right?) I'd like to see some discussion on what should be done and what needs to be done. From where I sit, the most important thing is to make sure we can maintain a high level of semantic markup (hopefully making further improvements over what we already have), with generation of hypertext (HTML, info, whatever) being the next most important thing. Typeset documents are a requirement, but aren't as high up the list. I'm not terribly concerned about how XML/SGML-->foo conversion processes are implemented, with the caveat being that I need to be able to understand them without a massive learning curve. Clearly, Python code is a major option for tools (surprised?), but I can easily deal with using Java tools (with or without JPython), DSSSL processors (just don't expect me to maintain Jade/OpenJade!), XSL, CSS, and whatnot. I'd like to get away from having any Perl scripts involved, not because I think Perl is Evil, but because I'm not a Perl hacker. (Don't get me wrong; I make no claim that Perl is not Evil! ;) Comments, suggestions, volunteers? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From scott@chronis.pobox.com Thu Aug 26 22:55:23 1999 From: scott@chronis.pobox.com (Scott Cotton) Date: Thu, 26 Aug 1999 17:55:23 -0400 Subject: [Doc-SIG] XML Conversion Update In-Reply-To: <14277.44597.761409.555365@weyr.cnri.reston.va.us>; from Fred L. Drake, Jr. on Thu, Aug 26, 1999 at 05:14:29PM -0400 References: <14277.44597.761409.555365@weyr.cnri.reston.va.us> Message-ID: <19990826175523.B15392@chronis.pobox.com> On Thu, Aug 26, 1999 at 05:14:29PM -0400, Fred L. Drake, Jr. wrote: > > Last week I promised on the Python list to describe the current > status of the conversion to SGML/XML. Here it is! > > I'm currently able to parse all the LaTeX markup and generate either > XML or SGML. The structure of the output is very similar to the input > structure, but a number of minor improvements are made. The > improvements are mostly very localized and have more to do with > keeping the markup from getting to complex and disjointed, and > eliminating some bogosities. Excellent! > I am not at all decided on a DTD to use. I see three options: > > 1. DocBook -- this has been developed and heavily use-tested by a > number of corporate users, and is known to be good for technical > documentation. There are tools and stylesheets available to > convert from DocBook to HTML and printed formats. We'd probably > need to specialize it, but it's designed for that. Konrad > Hinsen has already developed one customization that he's using > to document Python modules, and there's an initiative to create > a common extension for documenting OO constructs. I've asked > Konrad for some sample documentation so I can see how it > actually works out. My concern with DocBook is that the markup > may be a bit on the "heavy" side; I don't want the document > source to be so markup-heavy that I'm the only one to work on > them. I personally am not a fan of this, since it seems like it could limit the contributors to those willing to learn DocBook, which, at a glance, looks much more complicated than learning a standard way to produce python docs. > 2. Create something similar to what we had in LaTeX, but with fewer > warts. This is appealing because the conversion would be done > sooner. However, new stylesheets would be needed, slowing down > the usefulness of the result. It would also be the easiest to > adopt for people already familiar with the current markup. This sounds appealing. [...] > > I'd like to see some discussion on what should be done and what > needs to be done. From where I sit, the most important thing is to > make sure we can maintain a high level of semantic markup (hopefully > making further improvements over what we already have), with > generation of hypertext (HTML, info, whatever) being the next most > important thing. Typeset documents are a requirement, but aren't as > high up the list. From my perspective, what's most important is a *simple*, well-documented and authoritative documentation markup. The more people who can easily produce docs for new code, the more documentation their will be, and a standard would facilitate sharing more documentation in everyone's favorite formats. With some kind of flexible-but-not-too-complex dtd, I'd probably work on producing python docs in all the formats that I'd like to see, such as vim tags and man pages (not that i liked the recent rant about the latter on c.l.p, but I would like and use and produce or help produce these formats if the dtd structure is simple and the authoritative text easy to parse) scott From mhammond@skippinet.com.au Fri Aug 27 02:18:22 1999 From: mhammond@skippinet.com.au (Mark Hammond) Date: Fri, 27 Aug 1999 11:18:22 +1000 Subject: [Doc-SIG] XML Conversion Update In-Reply-To: <14277.44597.761409.555365@weyr.cnri.reston.va.us> Message-ID: <003f01bef02a$0ec2c250$0801a8c0@bobcat> > I'm currently able to parse all the LaTeX markup and > generate either > XML or SGML. The structure of the output is very similar to Excellent! > I am not at all decided on a DTD to use. I see three options: Im pretty ignorant about this. I did look into the docbook DTD, and it is indeed complex. However, it also appears that much of it is optional. It seems that if a reasonable subset of the docbook features could be used and documented for use in Python it would be simpler to use, and save reinventing the wheel. A big benefit of DocBook that I see is that it may be possible to get professional printers to print hard-copies. However, our own custom DTD would also be fine, as in reality it is only the Python community that will use it, and also provide the tools. An interesting possibility is to use the new PDF routines developed by Andy et al. In conjunction with the XML tools, I believe it would be fairly simple to generate a very pretty PDF version of the docs - which would be very cool. Further, a standard DTD would definately encourage me (and hopefully others) to use it. I have been thinking about this for some time, and I feel confident that I could massage my documentation tools to generate whatever DTD we decide. This would provide advantages to all users, as a single suite of tools could be used to provide consistent and professional documentation for many extensions... I just realised I havent said much at all - really just offering encouragement that this is great news, definately the right direction, and I will definately utilize this for my own stuff. Mark. From fredrik@pythonware.com Fri Aug 27 09:45:32 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 27 Aug 1999 10:45:32 +0200 Subject: [Doc-SIG] XML Conversion Update References: <003f01bef02a$0ec2c250$0801a8c0@bobcat> Message-ID: <022301bef068$86c931b0$f29b12c2@secret.pythonware.com> Mark Hammond wrote: > > I am not at all decided on a DTD to use. I see three options: > > Im pretty ignorant about this. I did look into the docbook DTD, and it is > indeed complex. However, it also appears that much of it is optional. > > It seems that if a reasonable subset of the docbook features could be used > and documented for use in Python it would be simpler to use, and save > reinventing the wheel. me too! From Moshe Zadka Fri Aug 27 10:21:15 1999 From: Moshe Zadka (Moshe Zadka) Date: Fri, 27 Aug 1999 12:21:15 +0300 (GMT+0300) Subject: [Doc-SIG] XML Conversion Update In-Reply-To: <14277.44597.761409.555365@weyr.cnri.reston.va.us> Message-ID: On Thu, 26 Aug 1999, Fred L. Drake, Jr. wrote: > I am not at all decided on a DTD to use. I see three options: I want to suggest a thesis that the markup used (Language+ XML DTD/LaTeX style) has little effect on the ease, as long as a. There are few optional features b. There are good templates ready Personally, when I started to write Python docs, I knew LaTeX but not the specific Python style. I started from the templates, and looked for similar things in other docs. My LaTeX knowledge confused me, actually: I used math to heavy for the HTML conversion work well. This shows that DocBook is a bad idea /because/ people know it, and would have /too much/ freedom for any hope of uniformity. I vote for a roll-our-own style. As soon as we can get a conversion ready, there will be plenty of templates ready. More, a roll-our-own, as opposed to the LaTeX style, could reflect the structure of a Python source file more easily (for examples, not seperating the __init__ method from the rest of the methods, and putting the generic class description in it). This is also a bit of an egoism, since it would make the vapourware PythonML->POD easier. Just my 0.02$ -- Moshe Zadka . INTERNET: Learn what you know. Share what you don't. From Edward Welbourne Fri Aug 27 11:01:50 1999 From: Edward Welbourne (Edward Welbourne) Date: Fri, 27 Aug 1999 11:01:50 +0100 Subject: [Doc-SIG] XML Conversion Update In-Reply-To: References: <14277.44597.761409.555365@weyr.cnri.reston.va.us> Message-ID: > ... offering encouragement that this is great news, definately the > right direction, and I will definately utilize this for my own stuff. Three Cheers for Fred ! As a matter of clarification ... supposing we take any one of the three options (DocBook, inertia, revolution), they'll all be parsed down to XML, so: to what extent can we rely on being able to generate DocBook *from* the XML we've produced using either of the other options ? After all, XML to XML translations are supposed to all be natural and easy. If that'll be easy, we can get all of the benefits of DocBook out of either of the others. Me, I'm naturally predisposed towards interesting times, so I go for revolution every time: and I trust this community to produce a nice simple way of marking up text that I'll be happy to use as soon as I have some documentation to write. KISS, Eddy. -- Keep It Straightforward, Simpleton. From skip@mojam.com (Skip Montanaro) Fri Aug 27 14:23:40 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 27 Aug 1999 08:23:40 -0500 (CDT) Subject: [Doc-SIG] XML Conversion Update In-Reply-To: References: <14277.44597.761409.555365@weyr.cnri.reston.va.us> Message-ID: <14278.36935.322206.110605@dolphin.mojam.com> Moshe> This shows that DocBook is a bad idea /because/ people know it, Moshe> and would have /too much/ freedom for any hope of uniformity. Moshe> I vote for a roll-our-own style. Well, how about we roll our own and it just happens to be a strict subset of docbook? You document it as the Pythonic Way To Write Documentation, and buried deep in some appendix it says (in six-point font): This DTD is a subset of DocBook. That said, can you just start whacking useless appendages off of DocBook? Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 | Python: Programming the way Guido indented... From hinsen@cnrs-orleans.fr Fri Aug 27 16:36:25 1999 From: hinsen@cnrs-orleans.fr (hinsen@cnrs-orleans.fr) Date: Fri, 27 Aug 1999 17:36:25 +0200 Subject: [Doc-SIG] XML Conversion Update In-Reply-To: <199908270502.BAA09197@python.org> (doc-sig-admin@python.org) References: <199908270502.BAA09197@python.org> Message-ID: <199908271536.RAA02570@chinon.cnrs-orleans.fr> > > I am not at all decided on a DTD to use. I see three options: > > Im pretty ignorant about this. I did look into the docbook DTD, and it is > indeed complex. However, it also appears that much of it is optional. > > It seems that if a reasonable subset of the docbook features could be used > and documented for use in Python it would be simpler to use, and save > reinventing the wheel. I did use DocBook, and I certainly had lots of problems with it. However, the real problem is not the complexity (for any given project, most of DocBook markup is irrelevant), but the lack of decent documentation. This might change when the O'Reilly book by Norman Walsh is out. The second problem I wasted a lot of time with is Jade plus JadeTeX; I just couldn't get decent paper output with it. And I didn't really want to learn DSSSL either. In the end I wrote my own special-purpose converter using PyDOM, which was finished in a day! So my suggestions are: - Use DocBook-XML, or perhaps Norman Walsh's simplified DocBook, or something intermediate. Maybe some modifications must be made, but that is relatively simple. - Use Python tools for conversion, that's what we are all most familiar with. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From Moshe Zadka Fri Aug 27 18:14:40 1999 From: Moshe Zadka (Moshe Zadka) Date: Fri, 27 Aug 1999 20:14:40 +0300 (GMT+0300) Subject: [Doc-SIG] XML Conversion Update In-Reply-To: <14278.36935.322206.110605@dolphin.mojam.com> Message-ID: On Fri, 27 Aug 1999, Skip Montanaro wrote: > Moshe> I vote for a roll-our-own style. > > Well, how about we roll our own and it just happens to be a strict subset of > docbook? You document it as the Pythonic Way To Write Documentation, and > buried deep in some appendix it says (in six-point font): > > This DTD is a subset of DocBook. Not good, for exactly the reason I outlined earlier: some bozo will try to write DocBook, just like this bozo tried to write LaTeX. We'll need to extend DocBook anyway, for primitives like , , etc. Personally, I do not want anything like ,
, or any such cruft in /library docs/ (obviously, these are needed for other kinds of docs: more on this later). So, the only thing we will be left with from DocBook will be things like (don't know the exact names, guessing...) , etc. So, its better to roll our own, stealing from DocBook whatever we can. Thus, we get (as much as possible) easy conversion for both old Python-doccers, and old DocBook-heads. That said, I think we need a completely different system for the rest of the docs: 1. The tutorial is simply a book about Python, and as such should be written like every other technical book. Moreover, Guido is (currently) the sole maintainer so he has last say. 2. The extending/embedding manual is similar. 3. The Python/C API needs a much better solution anyway: while the basic API is good, the documentation is pretty horrible. I do think we might need a specific XML DTD for that document, but once again, Guido has final say because he'll (probably) be writing it. However, most module documentations will /not/ be written by Guido. In fact, the main goal should be that a module and the documentation are written by the same guy at the same time. Hence, the tool to write the library reference has the following design goals: 1. Low barrier for entry: not higher then for writing Python modules. 2. Tools to help with it: syntax checkers, and maybe even creators. I dream of a program which will turn the following code class MyClass: def __init__(self, n): self.n=n def foo(): print n Into the following document XXX Describe class here! XXX Insert description here! XXX Insert descritption here! . . 3. A formidable array of 2XXX convertors: 2html, 2txt, 2man, 2windowshelp, 2info, 2docbook<0.5 wink> I think a new Pythonic-one-way-to-do-it minimalistic DTD is the way to go. > That said, can you just start whacking useless appendages off of DocBook? Where can I get the DTD? Only heard about it, never saw it... -- Moshe Zadka . INTERNET: Learn what you know. Share what you don't. From paul@prescod.net Fri Aug 27 17:49:11 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 27 Aug 1999 12:49:11 -0400 Subject: [Doc-SIG] XML Conversion Update References: <14277.44597.761409.555365@weyr.cnri.reston.va.us> Message-ID: <37C6C187.AC3265F@prescod.net> Fred L. Drake, Jr. wrote: > > My concern with DocBook is that the markup > may be a bit on the "heavy" side; I don't want the document > source to be so markup-heavy that I'm the only one to work on > them. I think that we should use a variant of Docbook and use some SGML minimizations supported by sgmllib (or that COULD be supported by sgmllib). WE can trivially use sgmllib+sax to normalize minimized SGML to XML. > 3. Create something entirely new and specific to Python. How is this different from porting over what we have? Hasn't it evolved to be pretty Python specific? Paul Prescod From Fred L. Drake, Jr." References: <14278.36935.322206.110605@dolphin.mojam.com> Message-ID: <14278.54448.54685.956645@weyr.cnri.reston.va.us> Moshe Zadka writes: > Not good, for exactly the reason I outlined earlier: some bozo will try to > write DocBook, just like this bozo tried to write LaTeX. It's hard to predict what's needed for good documentation; I am *not* of a mind to avoid having support for very general documentation constructs. We want to have a single DTD to keep the learning curve and tool support under control, so we can't really be too stingy in designing the markup. > We'll need to extend DocBook anyway, for primitives like , > , etc. Personally, I do not want anything like , >
, or any such cruft in /library docs/ (obviously, these are There's more in the library documentation than module sections; this even gets me in trouble sometimes. But it is *very* important to keep in mind that library documentation can and should contain much more than basic reference material. > So, its better to roll our own, stealing from DocBook whatever we can. > Thus, we get (as much as possible) easy conversion for both old > Python-doccers, and old DocBook-heads. > > That said, I think we need a completely different system for the rest of > the docs: > > 1. The tutorial is simply a book about Python, and as such should be > written like every other technical book. Moreover, Guido is (currently) > the sole maintainer so he has last say. Guido has the last say about everything he does, of course. On the other hand, he's not the only person who maintains the documentation. He's certainly not the one who does the most of the work on it. This makes it sound like a DocBook project. > 2. The extending/embedding manual is similar. DocBook, with appropriate OO extensions, would be a very good match for the extending & embedding manual as well. > 3. The Python/C API needs a much better solution anyway: while the basic > API is good, the documentation is pretty horrible. I do think we might > need a specific XML DTD for that document, but once again, Guido has final > say because he'll (probably) be writing it. Guido is the author of the original version of the document, but he is not the maintainer. That seems to be my job (which I consider a good thing ;). This is very much a kind of document that DocBook was designed to handle. The OO support needs to be present, but that should be doable as a normal DocBook extension. The organizational and completeness problems with the API reference are orthagonal to the DTD issue; we just haven't had the time. I try to add to and enhance the document as specific questions come up, but can't seem to find enough time. (Things should get better once the conversion is done, but not by a whole lot!) > However, most module documentations will /not/ be written by Guido. In > fact, the main goal should be that a module and the documentation are > written by the same guy at the same time. Hence, the tool to write the > library reference has the following design goals: Yes; this is one of the two most important issues. The other (which is somewhat at odds with this) is that whatever DTD we select be usable for very high grade documentation that's much more elaborate than basic module documentation. > 1. Low barrier for entry: not higher then for writing Python modules. This is unattainable. The biggest barriers to entry for documentation writing are motivation and natural language. Few people are really good with their own native language, esp. in its written form. Explaining things to others through the written word is very difficult. Python is much easier to learn! > 2. Tools to help with it: syntax checkers, and maybe even creators. I > dream of a program which will turn the following code This is relatively easy once you have a format, and I fully intend to do something like this. Konrad Hinsen has done some work with Daniel Larson's pythondoc to generate DocBook with his own Python extension; I'm sure something similar could be done with whatever form we choose. > 3. A formidable array of 2XXX convertors: 2html, 2txt, 2man, > 2windowshelp, 2info, 2docbook<0.5 wink> Yes. Again, this is relatively easy. I'd like to point out that to make a switch, the only output we need to care about it HTML. All others will follow as they are needed, so a handful should be available quickly. To call the XML version of the documentation the reference version, I will require an HTML conversion and one typeset version. > I think a new Pythonic-one-way-to-do-it minimalistic DTD is the way > to go. A DTD that's too minimal will not be strong enough for writing the documentation. A good DTD that's workable for all the documents is my personal requirement: only one DTD. More than one increases the learning curve for all authors and maintainers. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Fred L. Drake, Jr." References: <14277.44597.761409.555365@weyr.cnri.reston.va.us> <37C6C187.AC3265F@prescod.net> Message-ID: <14278.54862.399167.712458@weyr.cnri.reston.va.us> Paul Prescod writes: > I think that we should use a variant of Docbook and use some SGML > minimizations supported by sgmllib (or that COULD be supported by > sgmllib). WE can trivially use sgmllib+sax to normalize minimized SGML So you favor SGML over XML? That had been my original thought, but I shifted as more & better XML tools became available. I am not tied to XML, however. I said: > 3. Create something entirely new and specific to Python. Which Paul questioned: > How is this different from porting over what we have? Hasn't it evolved > to be pretty Python specific? What we have is fairly Python-specific, but there's still a lot of legacy which I'd love to get rid of. I'm not at all convinced that it's terribly *good*. This conversion effort would be an excellent time to use a better-designed structure. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Fred L. Drake, Jr." References: <14277.44597.761409.555365@weyr.cnri.reston.va.us> Message-ID: <14278.56573.880083.581360@weyr.cnri.reston.va.us> Edward Welbourne writes: > As a matter of clarification ... supposing we take any one of the three > options (DocBook, inertia, revolution), they'll all be parsed down to > XML, so: to what extent can we rely on being able to generate DocBook > *from* the XML we've produced using either of the other options ? After > all, XML to XML translations are supposed to all be natural and easy. Edward, I don't see any technical limitations, but I'd be very wary of assuming it would be "easy" or that *I* would implement the transformations. Writing good XSL won't be any easier than writing good LaTeX styles, and you'd have to write it in XML to boot. (Ever played with XSL? It's powerful... but tedious.) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From paul@prescod.net Fri Aug 27 22:41:27 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 27 Aug 1999 17:41:27 -0400 Subject: [Doc-SIG] XML Conversion Update References: <14277.44597.761409.555365@weyr.cnri.reston.va.us> <37C6C187.AC3265F@prescod.net> <14278.54862.399167.712458@weyr.cnri.reston.va.us> Message-ID: <37C70607.607DBF75@prescod.net> Fred L. Drake, Jr. wrote: > > Paul Prescod writes: > > I think that we should use a variant of Docbook and use some SGML > > minimizations supported by sgmllib (or that COULD be supported by > > sgmllib). WE can trivially use sgmllib+sax to normalize minimized SGML > > So you favor SGML over XML? I favor a well-defined subset of SGML using basically the XML features plus end-tag minimization. That can massively cut down on the annoyance factor. If we ship a simple PyML to DBXML script with Python nobody will complain that we are doing something "non-standard". > That had been my original thought, but > I shifted as more & better XML tools became available. Well in the Python universe any XML tool that uses SAX or the DOM *is* an SGML tool. Admittedly the Java world is not as open-minded as we are but that's their problem. Jade certainly has no problem with either XML or SGML. I think that trying to stick carefully to DocBook is probably too much work. We should design a Python-ic variant -- just like we do with APIs. We can use a transformation to "get to" the standard version (just as we use classes/functions for abstraction in APIs) Paul Prescod From mhammond@skippinet.com.au Sat Aug 28 00:24:04 1999 From: mhammond@skippinet.com.au (Mark Hammond) Date: Sat, 28 Aug 1999 09:24:04 +1000 Subject: [Doc-SIG] XML Conversion Update In-Reply-To: <37C70607.607DBF75@prescod.net> Message-ID: <001201bef0e3$4284bb70$0801a8c0@bobcat> [Paul writes] > I favor a well-defined subset of SGML using basically the XML features > plus end-tag minimization. That can massively cut down on the > annoyance > factor. > ... > I think that trying to stick carefully to DocBook is probably too much > work. We should design a Python-ic variant -- just like we do > with APIs. This seems to be converging on agreement. Fred - what is the next step - assuming Paul's statement (or a slight refinement thereof) is accepted, how do we move forward? How do we design the DTD? Does anyone have enough experience with this stuff that they could make a first pass? Mark. From Moshe Zadka Sat Aug 28 06:58:54 1999 From: Moshe Zadka (Moshe Zadka) Date: Sat, 28 Aug 1999 08:58:54 +0300 (GMT+0300) Subject: [Doc-SIG] XML Conversion Update In-Reply-To: <14278.54448.54685.956645@weyr.cnri.reston.va.us> Message-ID: On Fri, 27 Aug 1999, Fred L. Drake, Jr. wrote: > It's hard to predict what's needed for good documentation; I am > *not* of a mind to avoid having support for very general documentation > constructs. > We want to have a single DTD to keep the learning curve and tool > support under control, so we can't really be too stingy in designing > the markup. I don't think I agree here. Look at POD: it's a wonderful form of documentation for the CPAN modules, and it's a very minimalistic markup. (Of course, it will never be the case that all modules have high quality documentation: we can only solve the technical problems.) > There's more in the library documentation than module sections; this > even gets me in trouble sometimes. But it is *very* important to keep > in mind that library documentation can and should contain much more > than basic reference material. OK, you're right here. So let me put my original point: A /module/ documentation should have a simple DTD...etc. The library reference should have a more general DTD, which will include a element. Thus we get the best of all worlds: a general documentation format of the sections of the library reference, and a simple format for the every day module documentation. Of course, we'd need and elements, but we'll get to that when we design the DTD. > > 1. The tutorial is simply a book about Python, and as such should be > > written like every other technical book. Moreover, Guido is (currently) > > the sole maintainer so he has last say. > > Guido has the last say about everything he does, of course. On the > other hand, he's not the only person who maintains the documentation. > He's certainly not the one who does the most of the work on it. > This makes it sound like a DocBook project. Of course, that should be re.sub("Guido", "Guido/Fred"), but this doesn't detract from my point: while many people will offer suggestions, having a central maintainer (and a high barrier for entry) is /not/ a bottleneck. I have no problem with that being a DocBook project, especially as I will never have to write anything there<0.5 wink>. Ditto for the next 2 [about the bad Python/C API docs] > The organizational and completeness problems with the API reference > are orthagonal to the DTD issue; Of course: they would have been better as text. But I do think that part of the problem is organizational, so it's not completely orthogonal to the DTD issue. > we just haven't had the time. I try > to add to and enhance the document as specific questions come up, but > can't seem to find enough time. (Things should get better once the > conversion is done, but not by a whole lot!) We all appreciate your work, of course. > > 1. Low barrier for entry: not higher then for writing Python modules. > > This is unattainable. The biggest barriers to entry for > documentation writing are motivation and natural language. Few people > are really good with their own native language, esp. in its written > form. Explaining things to others through the written word is very > difficult. > Python is much easier to learn! Well, you totally missed me here: of course we can't teach people English nor good style. What I meant was that learning the DTD should be no harder then Python, so just in case we have a super-Python programmer which also won the Pulitzer but doesn't have time to read a 500 page "How to Document in Python for Idiots", he'll be able to write documentation for his modules. > ... To call the XML version of the documentation the > reference version, I will require an HTML conversion and one typeset > version. Which will probably mean 2python-latex, because that's the easiest way. > A DTD that's too minimal will not be strong enough for writing the > documentation. A good DTD that's workable for all the documents > is my personal requirement: only one DTD. More than one increases the > learning curve for all authors and maintainers. I disagree: 90% of the authors will only write library reference for their (or other people's) modules. We need, first and foremost, to cater to them. And besides, I do not believe that a single DTD which "does everything" is better then a small set of syncronised DTDs (I definitely do /not/ want to remember that emphasis is in the module docs and in the tutorial DTD, but we can easily keep that from happening. -- Moshe Zadka . INTERNET: Learn what you know. Share what you don't. From sean@digitome.com Sat Aug 28 12:00:56 1999 From: sean@digitome.com (Sean Mc Grath) Date: Sat, 28 Aug 1999 12:00:56 +0100 Subject: [Doc-SIG] XML Conversion Update In-Reply-To: <14278.54448.54685.956645@weyr.cnri.reston.va.us> References: <14278.36935.322206.110605@dolphin.mojam.com> Message-ID: <3.0.6.32.19990828120056.00972440@gpo.iol.ie> [Fred Drake] > > A DTD that's too minimal will not be strong enough for writing the >documentation. A good DTD that's workable for all the documents >is my personal requirement: only one DTD. More than one increases the >learning curve for all authors and maintainers. > I want to apologise in advance because I have not got the time right now to fully justify what I am about to say. Please forgive this transgression -- I will find time and post a justification as soon as I can! DocBook is not the answer. If anything, DocBook is the question. I am a strong believe in micro-document SGML/XML architectures. i believe a micro-document approach better suites the Python doc project. It has advantages on many fronts - authoring, production, maintenance, content re-use. Here is what I suggest: We need N *small* DTDs where N is the number of different *types* of information that make up the Python docs. e.g. ModuleDoc, HowToDoc and so on. Each one if these is an "information object" and parses to the DTD for that class of object. Using a simple "collection" DTD, information objects are assembled into hierarchical structures for management and publishing purposes:- Library String Services Bottom line: One big DTD is not the way to go in my opinion. We need N tiny DTDs - one for each class of information. We then use a simple assembly DTD such as above to gather together information objects for publishing purposes. I cannot close without pointing out that this microdocument architecture approach is very well suited to processing with Python. I have built Python based publishing systems using it. Whilst down-translating to, say, HTML only two small documents need to be loaded into Python -- the collection file and the information object being rendered. Also, this architecture supports semantic naming of information objects which is very, very useful for cross-reference creation and management. Also, it is a no-brainer Python script to convert from a micro-document collection to a monolithic DTD such as docbook so that we can piggy-back on the existing docbook downtranslates:-) yours-in-an-awful-rush-because-I-am-supposed-to-finish-"XML- processing-with-Python"-for-Prentice-Hall-this-weekend-ly, Sean P.S. The Pyxie Open Source project that I will be kicking off with this book will have Python software that can be used right away to prototype a micro-document based Python Doc architecture. Developers Day co-Chair WWW9, April 2000, Amsterdam http://www.www9.org From jack@oratrix.nl Sun Aug 29 22:16:22 1999 From: jack@oratrix.nl (Jack Jansen) Date: Sun, 29 Aug 1999 23:16:22 +0200 Subject: [Doc-SIG] XML Conversion Update In-Reply-To: Message by Sean Mc Grath , Sat, 28 Aug 1999 12:00:56 +0100 , <3.0.6.32.19990828120056.00972440@gpo.iol.ie> Message-ID: <19990829211627.04674CF320@oratrix.oratrix.nl> While I agree with Sean (and others) that small DTDs are a lot better suited to documenting Python modules there's various standard-formatting things that you'd like to borrow from existing DTDs (emphasis, references to other manuals/sections, footnotes, etc). Is there a way that that could be done, without dragging in the whole of the (apparently huge and hairy, from the reports here) docbook DTD? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From digitome@iol.ie Mon Aug 30 10:48:30 1999 From: digitome@iol.ie (Sean Mc Grath) Date: Mon, 30 Aug 1999 10:48:30 +0100 Subject: [Doc-SIG] XML Conversion Update In-Reply-To: <19990829211627.04674CF320@oratrix.oratrix.nl> References: <3.0.6.32.19990828120056.00972440@gpo.iol.ie> Message-ID: <3.0.6.32.19990830104830.00973a60@gpo.iol.ie> At 23:16 29/08/99 +0200, you wrote: >While I agree with Sean (and others) that small DTDs are a lot better >suited to documenting Python modules there's various >standard-formatting things that you'd like to borrow from existing >DTDs (emphasis, references to other manuals/sections, footnotes, etc). > >Is there a way that that could be done, without dragging in the whole >of the (apparently huge and hairy, from the reports here) docbook DTD? I think we should grab some of the formatting things from the HTML tag soup - including a really simple table model. A key question I believe is the naming convention issue. This is key to document management and key to cross references. I believe we should strive for a semantic naming scheme for information objects. I propose a naming scheme based on what I dub "fully qualified information object identifiers". The idea is to use the hierarchical location of an information object in a document assembly to arrive at a meaninful and unique names e.g.: Library_Reference-Python_Services-UserList.xml API-Abstract_Objects_Layer-Mapping_Protocol.xml As well as acting as names for information object storage these are also names for xref purposes e.g.: See the mapping protocol for more information. I suggest we go with XML rather than SGML in the sense that anything checked in/out of the system is XML. People who know SGML will probably want to pepper in some tag minimization for their emacs setup:-) They can then use James Clarks SX for example to convert to XML. regards, Developers Day Co-Chair, 9th International World Wide Web Conference 16-19, May, 2000, Amsterdam, The Netherlands http://www9.org From guido@CNRI.Reston.VA.US Mon Aug 30 16:40:39 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 30 Aug 1999 11:40:39 -0400 Subject: [Doc-SIG] XML Conversion Update In-Reply-To: Your message of "Mon, 30 Aug 1999 10:48:30 BST." <3.0.6.32.19990830104830.00973a60@gpo.iol.ie> References: <3.0.6.32.19990828120056.00972440@gpo.iol.ie> <3.0.6.32.19990830104830.00973a60@gpo.iol.ie> Message-ID: <199908301540.LAA07355@eric.cnri.reston.va.us> > The idea is to use the hierarchical location of an information > object in a document assembly to arrive at a meaninful and unique > names e.g.: > > Library_Reference-Python_Services-UserList.xml > API-Abstract_Objects_Layer-Mapping_Protocol.xml > > As well as acting as names for information object storage > these are also names for xref purposes e.g.: > > See > the mapping protocol for more information. Without any context, this looks like a horrible idea in one detail: the mixing of underscores and hyphens that you propose. Anything, but not that! Make it CamelCase if you have to: LibraryReference-PythonServices-UserList.xml API-AbstractObjectsLayer-MappingProtocol.xml --Guido van Rossum (home page: http://www.python.org/~guido/) From paul@prescod.net Mon Aug 30 15:48:58 1999 From: paul@prescod.net (Paul Prescod) Date: Mon, 30 Aug 1999 10:48:58 -0400 Subject: [Doc-SIG] XML Conversion Update References: <3.0.6.32.19990828120056.00972440@gpo.iol.ie> <3.0.6.32.19990830104830.00973a60@gpo.iol.ie> Message-ID: <37CA99DA.159B0DDF@prescod.net> Sean Mc Grath wrote: > > I believe we should strive for a semantic naming scheme for > information objects. I propose a naming scheme based > on what I dub "fully qualified information object identifiers". > The idea is to use the hierarchical location of an information > object in a document assembly to arrive at a meaninful and unique > names e.g.: > > Library_Reference-Python_Services-UserList.xml > API-Abstract_Objects_Layer-Mapping_Protocol.xml Great but what about when UserList.xml moves -- all links break. Global names are more robust. > I suggest we go with XML rather than SGML in the sense > that anything checked in/out of the system is XML. > People who know SGML will probably want to pepper > in some tag minimization for their emacs setup:-) > They can then use James Clarks SX for example > to convert to XML. This presumes that the character representation of the text is irrelevant. This is emphatically NOT the case for the same reasons that it is not the case with Python. The first problem is that I will be very pissed off if I write in a particular style and then check my document in and get it back in a very different style. The second problem is that "diff" will report that every line has changed. That in turn messes up CVS. I prefer to operate on a hands-off basis. What you edit is what you check in is what is stored is what gets checked out is what you edit. The first time some SGML user messes this up I expect everyone will be rightly pissed off. This means that we need to make the simplified SGML vs. XML choice for real. We can't presume that everyone will do what they like. I could live with XML but I think that the cost of allowing shorttend end tags is pretty minor and can make a huge difference in type-ability. Con: this will break compatibility with some XML editors -- do we expect Python hackers to use sissified GUI editors?? :) Paul Prescod From digitome@iol.ie Mon Aug 30 18:06:17 1999 From: digitome@iol.ie (Sean Mc Grath) Date: Mon, 30 Aug 1999 18:06:17 +0100 Subject: [Doc-SIG] XML Conversion Update In-Reply-To: <199908301540.LAA07355@eric.cnri.reston.va.us> References: <3.0.6.32.19990828120056.00972440@gpo.iol.ie> <3.0.6.32.19990830104830.00973a60@gpo.iol.ie> Message-ID: <3.0.6.32.19990830180617.009cabc0@gpo.iol.ie> >> As well as acting as names for information object storage >> these are also names for xref purposes e.g.: >> >> See >> the mapping protocol for more information. > [Guido] >Without any context, this looks like a horrible idea in one detail: >the mixing of underscores and hyphens that you propose. Anything, but >not that! Make it CamelCase if you have to: > > LibraryReference-PythonServices-UserList.xml > API-AbstractObjectsLayer-MappingProtocol.xml > Yes, the dash/underscore soup is awful. CamelCasing is better. If the information objects live on a flat filesystem then the we need to restrict object names to a filesystem friendly subset. We probably want to eschew the likes of "&" for example. We can be more uninhibited if the information objects live in somelike like mySQL. The benefits of storage in a relational database probably only outweigh the drawbacks once the number of information objects gets very large though. regards, Developers Day Co-Chair, 9th International World Wide Web Conference 16-19, May, 2000, Amsterdam, The Netherlands http://www9.org From digitome@iol.ie Mon Aug 30 18:00:10 1999 From: digitome@iol.ie (Sean Mc Grath) Date: Mon, 30 Aug 1999 18:00:10 +0100 Subject: [Doc-SIG] XML Conversion Update In-Reply-To: <37CA99DA.159B0DDF@prescod.net> References: <3.0.6.32.19990828120056.00972440@gpo.iol.ie> <3.0.6.32.19990830104830.00973a60@gpo.iol.ie> Message-ID: <3.0.6.32.19990830180010.009aacc0@gpo.iol.ie> >Sean Mc Grath wrote: >> >> I believe we should strive for a semantic naming scheme for >> information objects. I propose a naming scheme based >> on what I dub "fully qualified information object identifiers". >> The idea is to use the hierarchical location of an information >> object in a document assembly to arrive at a meaninful and unique >> names e.g.: >> >> Library_Reference-Python_Services-UserList.xml >> API-Abstract_Objects_Layer-Mapping_Protocol.xml > [Paul Prescod] >Great but what about when UserList.xml moves -- all links break. Global >names are more robust. Sorry, a case of a very important detail that I did not flesh out owing to my time crunch! I mentioned in the first post that this micro-document architecture supports link management. My proposal is that when UserList.xml moves, a redirect stub is left behind. I.e. the file (using Guido's suggested CamelCasing) LibraryReference-PythonServices-UserList.xml is not deleted, but its contents are just something like: Where, blah.xml is the new location for the UserList material. (Periodically, all redirects can be then be expunged). > >> I suggest we go with XML rather than SGML in the sense >> that anything checked in/out of the system is XML. >> People who know SGML will probably want to pepper >> in some tag minimization for their emacs setup:-) >> They can then use James Clarks SX for example >> to convert to XML. > [Paul Prescod] >This presumes that the character representation of the text is >irrelevant. This is emphatically NOT the case for the same reasons that >it is not the case with Python. The first problem is that I will be very >pissed off if I write in a particular style and then check my document >in and get it back in a very different style. The second problem is that >"diff" will report that every line has changed. That in turn messes up >CVS. I understand your points here but I still think we should go with plain vanilla XML as the storage notation. Even if we went with SGML, most SGML tools put inferred tags into your documents for you whether you like it or not! >I prefer to operate on a hands-off basis. What you edit is what you >check in is what is stored is what gets checked out is what you edit. The only SGML editor I know that allows you to work on a hands-off basis is emacs! Fully blown SGML editors like Adept, Author/Editor, Frame etc. all canonicalize the SGML as part of the read/edit/save round trip. >The first time some SGML user messes this up I expect everyone will be >rightly pissed off. This means that we need to make the simplified SGML >vs. XML choice for real. We can't presume that everyone will do what >they like. I could live with XML but I think that the cost of allowing >shorttend end tags is pretty minor and can make a huge >difference in type-ability. > >Con: this will break compatibility with some XML editors -- do we expect >Python hackers to use sissified GUI editors?? :) Frankly, yes. There are some cool XML editing tools beginning to appear. As part of the Pyxie project I have developed a servicable XML editor with wxPython. With a bit of work, it could be tailored to the documentation project to produce easy to use, fully Python based tools for editing/maintaining the Python docs. IBM have made available a Java app. which, given a DTD will spit out a validating, Java based XML editing app tailored to that DTD. Henry Thomsons XED is Python/Tk based and is getting very usable in my opinion. Corel's Wordperfect has a ridiculously good XML editing capability for such a cheap office suite product! Even if we went with SGML and people used Adept, Author/Editor, FrameMaker+SGML, whatever, the situation would be the same - tag minimization would be removed by the check-out/edit/check-in round trip. regards, Developers Day Co-Chair, 9th International World Wide Web Conference 16-19, May, 2000, Amsterdam, The Netherlands http://www9.org From da@ski.org Mon Aug 30 18:20:22 1999 From: da@ski.org (David Ascher) Date: Mon, 30 Aug 1999 10:20:22 -0700 (Pacific Daylight Time) Subject: [Doc-SIG] XML Conversion Update In-Reply-To: <3.0.6.32.19990830180010.009aacc0@gpo.iol.ie> Message-ID: [Paul Prescod] > >Con: this will break compatibility with some XML editors -- do we expect > >Python hackers to use sissified GUI editors?? :) [Sean McGrath] > Frankly, yes. There are some cool XML editing tools beginning > to appear. As part of the Pyxie project I have developed a > servicable XML editor with wxPython. With a bit of work, it could > be tailored to the documentation project to produce easy to > use, fully Python based tools for editing/maintaining the > Python docs. Let me second that. I refuse to write XML until there are real tools. And by that I mean that I don't want to type or , &stuff; etc.. Emacs is a good code editor. It's a terrible document editor. I'll write code that *generates* and *processes* XML, of course, but I hate writing 'text' in emacs. IMHO, of course. --david From paul@prescod.net Sun Aug 29 17:17:59 1999 From: paul@prescod.net (Paul Prescod) Date: Sun, 29 Aug 1999 12:17:59 -0400 Subject: [Doc-SIG] XML Conversion Update References: <3.0.6.32.19990828120056.00972440@gpo.iol.ie> <3.0.6.32.19990830104830.00973a60@gpo.iol.ie> <3.0.6.32.19990830180010.009aacc0@gpo.iol.ie> Message-ID: <37C95D37.1E9C1B11@prescod.net> If I honestly believed that most of us were going to end up using XML editors, I would support using regular XML as a no-brainer. But I think that the average Python hacker is no more likely to download a specific, customized XML editor than they are to download and use IDLE in preference to their favorite text editor. I wrote my last book in vi(1) (admittedly an extreme choice) and the one before in Emacs (a little more reasonable). I expect this to be the norm but neither of us has a crystal ball. And if we DO use XML editors then we run into the "diff/CVS" problem. This is a MAJOR problem for an open source effort. Maybe we can find/create an XML-smart diff and integrate it with CVS. In thiscase I would't be so concerned...I would just unnormalize data I checked out and re-normalize it when I checked in. > I understand your points here but I still think we should go with > plain vanilla XML as the storage notation. Even if we went with > SGML, most SGML tools put inferred tags into your documents for > you whether you like it or not! That's why I don't use them. > The only SGML editor I know that allows you to work on a hands-off basis > is emacs! Fully blown SGML editors like Adept, Author/Editor, > Frame etc. all canonicalize the SGML as part of the read/edit/save > round trip. I think that XMetaL comes pretty close.It has a "raw text" mode that you can switch back and forth to. Some HTML editors (e.g. DreamWeaver) also have this concept so maybe hands-off editing will be a standard feature of XML editors in a few years. In the meantime I use whatever text editor I happen to have installed. Yes my knuckles also drag on the ground. Paul Prescod