From fdrake@users.sourceforge.net Thu Mar 1 05:20:23 2001 From: fdrake@users.sourceforge.net (Fred L. Drake) Date: Wed, 28 Feb 2001 21:20:23 -0800 Subject: [Doc-SIG] [development doc updates] Message-ID: The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ From tony@lsl.co.uk Thu Mar 1 16:57:05 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 1 Mar 2001 16:57:05 -0000 Subject: [Doc-SIG] docutils 0.0.4 "do you really expect me to believe that?" In-Reply-To: <003301c0a0c9$5c577500$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <006001c0a270$a4f83260$f05aa8c0@lslp7o.int.lsl.co.uk> The first nearly useful release of docutils (0.0.4 - the next release might have a somewhat shorter number!) is now to be found described at: http://www.tibsnjoan.co.uk/docutils/status.html which is much the same as: http://homepage.ntlworld.com/tibsnjoan/docutils/status.html Please note that I have *not* updated the Demon site. New with this release: * stpy.py is back, with its oh-so-attractive command line interface * Python modules and StructuredText files can be read in * A DOM tree can be generated * XML can be output (thanks to minidom's writexml method) * The RE used for descriptive lists has been fixed (again) - hopefully it's all better now * Some refactoring of DocNodes.py was done - it may not be finished yet * The doctests in DocNodes.py and TextNodes.py still work (well, that's not *new*) My next task is to add HTML output, which will make it easier for me to decide if I've done the DOM trees right, and to document the DTD I think I'm using for the DOM tree (so other people have some hope of interfacing to it). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Abstraction is one of those notions that Python tosses out the window, yet expresses very well. - Gordon McMillan My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From fdrake@acm.org Thu Mar 1 20:17:18 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 1 Mar 2001 15:17:18 -0500 (EST) Subject: [Doc-SIG] Doc build question In-Reply-To: <005401c09d38$f9685e70$c603a8c0@activestate.ca> References: <005401c09d38$f9685e70$c603a8c0@activestate.ca> Message-ID: <15006.44622.122765.728808@localhost.localdomain> Dan Milgram writes: > I'm having some difficulty simply building the html docs, and was hoping > someone could help me out as I am not familiar with LaTeX. All the docs > build ok, with the exception of the modindex.html files (they simply contain > the header and footer info but no actual index entries). I am using the > following relevant packages on a RedHat box: > latex2html99.2beta8 > ActivePerl5.6 > tetex-1.0.6-11 You don't say what version of Python you're using. Are you working with the CVS version, or a packaged 2.0 version? There was a problem building the modindex.html files, but I thought that was fixed before the 2.0 release. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Fri Mar 2 20:32:27 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 2 Mar 2001 15:32:27 -0500 (EST) Subject: [Doc-SIG] doc tree frozen for 2.1b1 Message-ID: <15008.859.4988.155789@localhost.localdomain> The documentation is frozen until the 2.1b1 annonucement goes out. I have a couple of checkins to make, but the formatted HTML for the Windows installer has already been cut & shipped. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Fri Mar 2 20:49:09 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 2 Mar 2001 15:49:09 -0500 (EST) Subject: [Doc-SIG] Python 2.1 beta 1 documentation online Message-ID: <15008.1861.84677.687041@localhost.localdomain> The documentation for Python 2.1 beta 1 is now online: http://python.sourceforge.net/devel-docs/ This is the same as the documentation that will ship with the Windows installer. This is the online location of the development version of the documentation. As I make updates to the documentation, this will be updated periodically; the "front page" will indicate the date of the most recent update. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@cj42289-a.reston1.va.home.com Sat Mar 3 19:47:49 2001 From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake) Date: Sat, 3 Mar 2001 14:47:49 -0500 (EST) Subject: [Doc-SIG] [development doc updates] Message-ID: <20010303194749.629AC28803@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Additional information on using non-Microsoft compilers on Windows when using the Distutils, contributed by Rene Liebscher. From edloper@gradient.cis.upenn.edu Sun Mar 4 23:00:33 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 04 Mar 2001 18:00:33 EST Subject: [Doc-SIG] Automated __doc__ string processing systems Message-ID: <200103042300.f24N0YD03695@gradient.cis.upenn.edu> I've recently been reading up on the status of automated documentation extraction from formatted inline comments in Python. It sounds like one day, a tool for extraction may become part of the standard library.. (PEP 224). I'm trying to find out more about what the current status of this is, who is actively working on such tools, and how I can help. I'm pretty new to the area, but as far as I can tell there are 3 such tools currently in active development: * Happydoc * Pydoc * Docutils Are all of these actually being actively developed? Is one more likely than the others to become the standard? I'd like to help work on this project, but I'm not sure who I should be contacting, and I'm pretty new to the area. Also, I wrote a short essay trying to list many of the issues that such doc tools must deal with in one place, and discuss those issues. I'd appreciate it if you could give me any feedback on it. It's available at: http://www.cis.upenn.edu/~edloper/pythondoc.html Thanks! -Edward From ping@lfw.org Mon Mar 5 00:10:02 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 4 Mar 2001 16:10:02 -0800 (PST) Subject: [Doc-SIG] Automated __doc__ string processing systems In-Reply-To: <200103042300.f24N0YD03695@gradient.cis.upenn.edu> Message-ID: Hi, Edward. On Sun, 4 Mar 2001, Edward D. Loper wrote: > I've recently been reading up on the status of automated documentation > extraction from formatted inline comments in Python. It sounds like > one day, a tool for extraction may become part of the standard library. Yes -- just this past week, pydoc became a part of the Python 2.1 beta distribution. Two new modules were added: inspect.py - Get useful information from live Python objects. pydoc.py - Generate Python documentation in HTML or text for interactive use. They are separately available at http://www.lfw.org/python/inspect.py http://www.lfw.org/python/pydoc.py and documented at http://www.lfw.org/python/inspect.html http://www.lfw.org/python/pydoc.html The Python modules are written to work with any version of Python from 1.5.2 up, so you should be able to just download the two files and run them. pydoc intentionally avoids the issues of syntax for formatting docstrings, as that has historically been a contentious issue. I looked at epydoc -- it's nice work, and your essay was a good survey of the issues and ideas. I hope that pydoc's inclusion doesn't disappoint you, as you have clearly done a lot of hard work on epydoc, and that pydoc proves worthy of its blessing. I hope to learn from what you've done as we take pydoc forward. -- ?!ng "The biggest cause of trouble in the world today is that the stupid people are so sure about things and the intelligent folk are so full of doubts." -- Bertrand Russell From edloper@gradient.cis.upenn.edu Mon Mar 5 02:22:29 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 04 Mar 2001 21:22:29 EST Subject: [Doc-SIG] Automated __doc__ string processing systems In-Reply-To: Your message of "Sun, 04 Mar 2001 16:10:02 PST." Message-ID: <200103050222.f252MTD18354@gradient.cis.upenn.edu> Ka-Ping, I'm glad to hear that something's getting added to the standard distribution. It seems like a really useful type of tool for keeping code interfaces well-documented. Do you think that it will be possible to eventually integrate the type of functionality I talked about in my essay into pydoc? (with some sort of special prefix or other mechanism, like separate formatted-doc strings) Or do you think that that type of functionality would be best put in a different tool? I guess that basically it doesn't seem reasonable for me to try to develop Epydoc into a full-fledged system, with at least 3 python-document-extraction systems already out there.. But I'd still like to work on the problem.. So I'm wondering where my efforts would be best spent. :) -Edward From tony@lsl.co.uk Mon Mar 5 10:30:06 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 5 Mar 2001 10:30:06 -0000 Subject: [Doc-SIG] Automated __doc__ string processing systems In-Reply-To: <200103042300.f24N0YD03695@gradient.cis.upenn.edu> Message-ID: <001601c0a55f$3f085240$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > I've recently been reading up on the status of automated documentation > extraction from formatted inline comments in Python. It sounds like > one day, a tool for extraction may become part of the > standard library.. As Ka-Ping Yee says elsewhere, already a done deal. > (PEP 224). I'm trying to find out more about what the current status > of this is, who is actively working on such tools, and how I can > help. I'm pretty new to the area, but as far as I can tell there are > 3 such tools currently in active development: > * Happydoc > * Pydoc > * Docutils > > Are all of these actually being actively developed? Is one > more likely > than the others to become the standard? I'd like to help work on this > project, but I'm not sure who I should be contacting, and I'm pretty > new to the area. Well, all of the three are doing different things. Ka-Ping Yee's pydoc will be in the standard distribution. It provides a command line utility (similar to Unix man in functionality, but aimed at Python modules). It provides an interactive "help" command, which can be used from the Python prompt. And it provides the ability to generate HTML pages. It doesn't yet address the *formatting* of the insides of docstrings. Doug Hellman's HappyDoc is an independent effort. It does not provide the "interactive help" facility, and its aims are rather different in philosophy - I think it is rather more ambitious in some ways about what it wants to do. It already has an attempt at interpreting StructuredText within doc strings. It wants to support *lots* of output formats. It will look in comments as well as in docstrings. Whilst it isn't in the standard distribution, it has been going a while, and is being used by various people. docutils (by me) is an attempt to provide the interpretation of the *inside* of a docstring. It results from many discussions over the years on the Doc-SIG about what those insides should look like, and is basically an evolved form of StructuredText (similar to and hopefully compatible with StructuredTextNG, which is, maybe, being developed by the Zope people). Although it has a simple command line interface, and can produce HTML, its main aim is to be used by utilities like the other two. Oh, and it's not finished yet. There have been other players, as well - pythondoc, Marc Lemburg's doc.py, crystal, for instance - not all of whom have necessarily abandoned work just because of what we are doing. The innards of Zope use StructuredText, and they parse documentation strings as well. Personally, I don't see a problem with having multiple tools. It is *essential* to have a tool in the standard distribution, and pydoc is just perfect for that purpose (it addresses so many of the needs all at once). It is slightly less essential, but still important, to have a common definition of how one *writes* (formats) a docstring, and some of the documentation for that has been written. Leveraging off something many people already (more or less) use is important here. My experiences of writing docutils will lead to a firmer explication of that, plus a tool that *works* with the explication. I'll be extremely happy if a second implementation of *that* appears, as well. > Also, I wrote a short essay trying to list many of the issues that > such doc tools must deal with in one place, and discuss those issues. > I'd appreciate it if you could give me any feedback on it. It's > available at: > http://www.cis.upenn.edu/~edloper/pythondoc.html It's Monday, I've just gotten into work, and I haven't had time to read your document properly yet. It looks at a brief scan as if it is working at a slight tangent to the ST initiative - not necessarily a bad thing. One of the things to bear in mind, though, is that experience shows that MOST PEOPLE will not use "heavy" markup in docstrings - so HTML, XML, TeX were right out - instead they want something easy to write, and easy to read *without processing* - this is why ST was adopted (I was initially a huge opponent of this, as I *like* markup, but I realised that practice wins over theory). I think that the javadoc type of information probably counts as "heavyweight" as well - it's certainly not readable. Hmm - looking at your example: """ @cvariable(v) The v field of the class @type(v) int @ivariable(i) The i field of the instance. Note that descriptions can continue onto the next line and can include *formatting*. @type(i) float @see(otherClass) that other class @author Edward Loper @author Another author @version $Id:$ """ I suspect that one would instead write it something like: """ Significant class values #v# - an integer, representing variability #i# - perhaps confusingly, a real value. (It's going to play merry hell if I ever want to search for this, with such a short name.) Also see ^#other_class# for similar ideas. Authors: * Edward Loper * Someone J. Else Version: """ (there are a couple of things in there that won't work in current STpy, at least one of which may be contentious, but the main point is that it is much more like a normal text than a piece of form-filling, which encourages explanation.) Sorry for being brief - I hope it doesn't come across as impolite. David Goodger also had a swathe of significant comments on the innards of docstrings, some while back, which one day I want to go through and comment on and maybe nick ideas from. I shall hang on to your document as well. (Hmm - that sounds a bit like I think *I'm* deciding how STpy works. I admit it! . Well, actually, when the first implementation gets a bit firmer, I'm hoping to prod people into commenting on some of the subtler minutiae of the way forwards - of course, if noone cares, *then* I get to rule the world... ) As to contributed work - playing with docutils as it advances, and more importantly comments on the "specification" at http://www.tibsnjoan.co.uk/docutils/STpy.html would be useful - unfortunately, the document is a little light on *why* features are as they are. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 5 15:01:57 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 5 Mar 2001 15:01:57 -0000 Subject: [Doc-SIG] docutils 0.0.5 "Earwig O!" In-Reply-To: <006001c0a270$a4f83260$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <001901c0a585$3966b680$f05aa8c0@lslp7o.int.lsl.co.uk> I have uploaded a new version of docutils, 0.0.5: http://www.tibsnjoan.co.uk/docutils/status.html which is much the same as: http://homepage.ntlworld.com/tibsnjoan/docutils/status.html Please note that I have *not* updated the Demon site. News for this release: * I have removed the sources for release 0.0.1 * I have provided .tgz and .zip files for this new release * HTML output is now provided - it's not very subtle (speaking as someone who *looks at* HTML code), but it is proving very useful in debugging/refining the code * the command line interface has changed (simplified) * various bugs have been fixed * literal quoting ('..') is partially broken, ho hum * on the other hand, indentation of lines in paragraph literals is now done correctly * it's a bit slow, ho hum (well, that's not new either) For a fun time (for some sense of fun) try running it on DocNodes.py So, this version of the utility is *actually* somewhere towards being useful. I would appreciate it if anyone out there would throw it at some python or stx files and see what they think. Comments are, thus, welcome. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Hmm - must get some new signatures... My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From gherman@darwin.in-berlin.de Mon Mar 5 19:41:20 2001 From: gherman@darwin.in-berlin.de (Dinu Gherman) Date: Mon, 05 Mar 2001 20:41:20 +0100 Subject: [Doc-SIG] Automated __doc__ string processing systems References: <200103042300.f24N0YD03695@gradient.cis.upenn.edu> Message-ID: <3AA3EBE0.D1376B5E@darwin.in-berlin.de> "Edward D. Loper" wrote: > > I'm pretty new to the area, but as far as I can tell there are > 3 such tools currently in active development: > * Happydoc > * Pydoc > * Docutils You may want to add one more effort to your list that, for the lack of a better name, so far is called "docpy". It is in some- thing like an embryonal form (which is why most likely you don't know about it) and comes with the current release of the Report- Lab package (inside the libs directory). See here if you want to give it a try (The current version is named docpy0.py. There is also one labelled docpy1.py, but this is *very* experimental and you probably don't want to touch that, yet.): http://www.reportlab.com/ ftp://ftp.reportlab.com/current.zip Naturally, all authors are handling partly different require- ments. ReportLab's are to generate electronic/print documenta- tion for modules/packages that is not necessarily interactive, but should serve to make customers understand what is being delivered and, ideally, how to use it. docpy builds on inspect and does generate several formats, like ASCII, HTML and PDF (with slithly varying quality). It seems to implement some of the functionality that pydoc now apparently has implemented as well, but in a rather different manner. It's too early to make grand announcements, but the focus of docpy is basically that of extensibility. A good example of what I mean is a module named graphicsdoc (also part of the current distribution in the lib directory) that is built on top of docpy and that serves as a tool to build a catalog of widgets and charts from the upcoming ReportLab charting sub- package. Well, this is just to name another effort, not to discourage you or anybody to continue working on her own. Ideally, one day some of these approaches will merge again where the philo- sophy and feature set allows to do that. Hopefully, there'll be some very informal meeting about this here in the conferen- ce hotel... Kind regards, Dinu -- Dinu C. Gherman ReportLab Consultant - http://www.reportlab.com ................................................................ "The only possible values [for quality] are 'excellent' and 'in- sanely excellent', depending on whether lives are at stake or not. Otherwise you don't enjoy your work, you don't work well, and the project goes down the drain." (Kent Beck, "Extreme Programming Explained") From edloper@gradient.cis.upenn.edu Tue Mar 6 01:46:41 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 05 Mar 2001 20:46:41 EST Subject: [Doc-SIG] Structured Text Message-ID: <200103060146.f261kfD10944@gradient.cis.upenn.edu> I've been going over the definitions of structured text (and its various flavors), trying to see if I can formalize it even more than Tibs did (http://homepage.ntlworld.com/tibsnjoan/STNG-format.html and http://homepage.ntlworld.com/tibsnjoan/docutils/STpy.html)... And a number of questions came up. I'm not sure if this is the correct forum for such questions.. If not, I apologize, and would appreciate it if you can tell me who I should be asking. Anyway, my questions were: 1. Does every string value have an interpretation as a Structured Text? That seems to be the case. If so, is that a Good Thing? As an example of a string that we might not want to give a value, consider: || indent level 0 || || indent level 1 || || indent level 2 || || indent level ?? I'd really prefer not to have cases like this have "undefined semantics." It seems like we either need to specify what they mean, or say that they're illegal. 2. If it is true that every string value has an interpretation as a Structued Text, does it make sense to officially "discourage" certain types of strings, such as the example listed above? It might also make sense to discourage strings like: || this || is || one messed up || paragraph 3. Which types of "code coloring" (emph, inline, etc.) can "wrap" over lines, and which can't? E.g., can I have an *emph statement that continues to the next line?* 4. Is there any official precedance ordering on the different types of "code coloring?" Will there be anytime soon? Any rules about what types of code coloring can be contained in what other types? 5. Does structural formatting or code coloring take precedance? For example, if a paragraph starts with "* foo *," will it be a normal paragraph with an emphasized first element, or a list item? (It'll be much easier for me to write formal rules if structure takes precedence. ;) ) 6. Among the list types, which take precedence? For example, if a paragraph starts with "1. foo -- bar", is it an ordered list item or a descriptive list item? 7. What is meant by saying that SGML text passes through? SGML isn't even a mark-up language, so I assume that the intent is something like "XML and HTML text passes through." But does that mean that in an expression like 'a*b*', the '*'s will be ignored? That seems unreasonably difficult to implement. What about an expression like ''? Does this mean I can't say things like if 'xz'? Is there strong support for the notion of letting "SGML" text pass through, or is it something that might be dropped? (I would certainly vote for dropping it. :) ) My eventual goal, to the extend that it's possible, is to write out a complete formal specification for StructuredText using something similar to BNF (Backus Naur Form). (I'm pretty sure that vanilla BNF is not powerful enough to capture StructuredText.) After I've done that, I'll start working on getting Emacs to colorize StructuredText strings. I'd also like to create a sort of test-suite set of strings to test how different implementations function on different "ambiguously defined" cases.. Any help and/or pointers are very much appreciated. :) -Edward From gherman@darwin.in-berlin.de Tue Mar 6 05:49:18 2001 From: gherman@darwin.in-berlin.de (Dinu Gherman) Date: Tue, 06 Mar 2001 06:49:18 +0100 Subject: [Doc-SIG] Structured Text References: <200103060146.f261kfD10944@gradient.cis.upenn.edu> Message-ID: <3AA47A5E.54231ADB@darwin.in-berlin.de> "Edward D. Loper" wrote: > > [...] I'd also like to create a sort of test-suite set of > strings to test how different implementations function on > different "ambiguously defined" cases.. > > Any help and/or pointers are very much appreciated. :) You may want to try this for testing: http://pyunit.sourceforge.net/ Dinu -- Dinu C. Gherman ReportLab Consultant - http://www.reportlab.com ................................................................ "The only possible values [for quality] are 'excellent' and 'in- sanely excellent', depending on whether lives are at stake or not. Otherwise you don't enjoy your work, you don't work well, and the project goes down the drain." (Kent Beck, "Extreme Programming Explained") From tony@lsl.co.uk Tue Mar 6 10:05:01 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 6 Mar 2001 10:05:01 -0000 Subject: [Doc-SIG] Structured Text In-Reply-To: <200103060146.f261kfD10944@gradient.cis.upenn.edu> Message-ID: <001e01c0a624$e88e0970$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper came up with some useful questions about StructuredText, which are his and are delimited by ">"... > I've been going over the definitions of structured text (and its > various flavors), trying to see if I can formalize it even more than > Tibs did (http://homepage.ntlworld.com/tibsnjoan/STNG-format.html and > http://homepage.ntlworld.com/tibsnjoan/docutils/STpy.html)... And a > number of questions came up. I'm not sure if this is the correct > forum for such questions.. So far as I'm concerned, it is. (Historically, actual implementation of a tool used to founder on such discussion because it used to occur *before* implementation. However, since the implementation is proceeding, questions are, by me, now a good thing. Also, in practice, certain points only become clear to one when one has to implement them!). BTW - if you can, look back in the list archive for comments by David Goodger, dated around November last year. They're on my "to comment on" list (actually, they're in the bundled TGZ or ZIP files of the docutils distro as well, now I think of it), and (from memory) had some interesting points to make. > 1. Does every string value have an interpretation as a Structured > Text? That seems to be the case. If so, is that a Good Thing? > As an example of a string that we might not want to give a value, > consider: > || indent level 0 > || > || indent level 1 > || > || indent level 2 > || > || indent level ?? > > I'd really prefer not to have cases like this have "undefined > semantics." It seems like we either need to specify what they > mean, or say that they're illegal. As with many things about ST, the answer is "that depends" - it's really partly an implementation issue, conditioned by the required audience. In the case of Zope's use of ST, HTML pages have to be generated on the fly from ST, and thus it is not acceptable to have unrecoverable errors which prevent that (it is better to produce a possibly-slightly-wrong rendering). In the case of something like pydoc, this is probably true as well. In the case of a developer "testing" their docstrings, they want to *know* about errors. So. In all cases the example given is illegal, and the result is formally undefined (both 'cos it's not written down anywhere, and 'cos it *should* be undefined). However, STClassic (the "original" Zope tool, also used by HappyDoc, for instance) will make a "best guess" about what is meant - I can't remember any details (I only skimmed that code), but I don't think it went to *too* great lengths. Currently, my tool will flag an error and either give up or ignore the text (I think the former) - this is clearly unacceptable in many cases, and is on the "fix" list. My feeling is that the user should be able to select "pedantic" mode, in which it complains and gives up, but in non-pedantic mode (the default) the "nearest" (in some sense) correct indentation should be selected. On the other hand, I don't actually believe that *every* string has an ST representation - just that this is not the example (I don't off-hand *have* an example, just a gut feeling). Hmm - there are, though, surprising things one cannot represent as ST - here's an awkward case:: 1. This is a list item:: This is some literal text (see the '::' above) I can't make this paragraph a "child" of the list item. My intent is to make the third para a child of the first (so it won't terminate the list). But I can't do that, because it can either be indented so that it becomes part of the literal text (not what I want), or it can be indented as it is. This is so regardless of how one ends the literal text (one can choose "when a paragraph at the parent paragraph's indentation is found" (in which case the third paragraph *has* to be at that indentation to be "seen" as a separate paragraph), or "when a paragraph with indentation less than the first line of literal text is found" (in which case the third paragraph *could* be indented more, but it would be illegal by the indentation rules)). There *is* a way round it, I've just realised, which *may* not work yet in my code, but will do so eventually: 1. This is a list item: :: This is some literal text (see the ':: above, which will be optimised away/become invisible) And now this paragraph *is* a child of the list item. (by following the ST rules blindly, a paragraph containing just '::' should be legal - one then just has to decide what it *means* and how it is rendered. I choose to take it as invisible - but I can't remember if I've implemented it yet - currently it probably renders as a single ':'). It's a hack, but it *does* follow the rules. (((hmm - time to refine terms. ST == StructuredText - I use this as a generic term for the "family" STClassic == StructuredText as implemented by Zope, and used by, for example, HappyDoc STNG == the new version of ST that Zope has been working on - I don't know its current status, as I haven't been following it recently. STpy == ST with extensions from STNG, added "Python" extensions, and possibly other extensions as well (as documented in the STpy.html document) There is a widely available module to implement STClassic. STNG was being worked on, and is available through CVS via the Zope website. STpy is currently partially implemented by my code.))) > 2. If it is true that every string value has an interpretation as a > Structued Text, does it make sense to officially "discourage" > certain types of strings, such as the example listed above? It > might also make sense to discourage strings like: > || this > || is > || one messed up > || paragraph Again, it depends if one is a document user or producer. For a document *producer*, we should discourage (1) aggressively. For a user, we can't. As to (2), I can't speak for the Zope codebase, but in mine it is only the first line of a (non-literal) paragraph that counts, and so the messed up paragraph will render predictably (and I choose to think that is the sensible choice). Note that it is a Good Thing to make only the first line count - it means that people who prefer paragraph styles like:: I start my paragraphs with an indentation 'cos my teachers told me to... can get away with it (well, they're mad, but they can get away with it). > 3. Which types of "code coloring" (emph, inline, etc.) can "wrap" over > lines, and which can't? E.g., can I have an *emph statement that > continues to the next line?* In STClassic, the '..' markup (literal) cannot contain line boundaries. emphasis and strong can. Descriptive list item titles can't. In my code (at the moment) in non-literal paragraphs all line boundaries have disappeared before the colourising code gets at the text, so all markup can span line boundaries (descriptive list items still can't, but that's a different issue). It was trivial to choose either option in my code, and I prefer the ability to ignore line ends - I like to use (more or less) ST in my emails, but I'm using Outlook which doesn't make it easy to tell where it will wrap lines, which makes it hard to use STClassic type literals. Note that a newline-plus-leading-whitespace is exactly equivalent to a single space. > 4. Is there any official precedance ordering on the different > types of "code coloring?" Yes. It is determined by the innards of the software. In my code, there is a list that the user can get access to and modify, which can change the order of markup (actually, the order of markup and also the RE used to detect each markup, and the tags generated therefrom). But I wouldn't expect this to be documented as such for ST itself - it's an implementation detail of a particular tool. > Any rules about what > types of code coloring can be contained in what other types? In a system using REs to recognise markup it's, well, tricky to nest markup. I don't *think* STClassic does. My code deliberately refrains fr om doing it at the moment - it's something I intend to implement eventually, but since it's fiddly and non-essential I'm leaving it for now (*how* to do it is fairly obvious, but it's fiddly and the sort of thing I have to draw a lot of pictures for, so I'd rather do more important things first, even though I myself *like* to nest markup). > 5. Does structural formatting or code coloring take precedance? For > example, if a paragraph starts with "* foo *," will it be a normal > paragraph with an emphasized first element, or a list > item? (It'll > be much easier for me to write formal rules if structure takes > precedence. ;) ) Hah - the devil is in the details. Which have never been formally documented. So the answer is, it depends. In STClassic, I can't remember, so you'll have to experiment. In my code, list item recognition is done in two places - first at paragraph splitting time, and secondly when the nodes are "combed" for various purposes. The same REs are used both times. Colourisation is done later (when the doctree is already in shape). Thus, a paragraph starting "* spam *" will be recognised as a list item, and the initial "*" will be removed well before colourising happens (mind, I'm saying that, but I haven't tried it!). (Hmm - like much of ST, this just tends to "fall out" of the design, which is one of the reasons I have immense respect for the people who came up with the original ideas - it manages to retain great economy of markup use, whilst generally "doing the right thing" in most normal situations) > 6. Among the list types, which take precedence? For example, if a > paragraph starts with "1. foo -- bar", is it an ordered list item > or a descriptive list item? Depends on the implementation. I can't offhand remember (although it's another thing a user of the module can fiddle with in my version). The order *does* matter - OK, I've looked it up, and the relevent code is: # Note that we want the descriptive list type to come # before the others, since a title might be a valid # bullet or sequence RE_list = [(RE_DESCRIPTIVE,"ditem",1), (RE_UNORDERED, "uitem",0), (RE_ORDERED, "oitem",0)] I'd stand by that - to make sense, descriptive comes first, and the order of the other two doesn't matter. So your example should be descriptive (title == "1. foo", text == "bar"). Do beware of the " -- " trap for descriptive list items - one wants to be able to do:: A list of delimiters: 'o' -- starts an unordered list item ' -- ' -- separates the parts of a descriptive list item and have it produce the "obvious" results (I keep hoping I've got that bit right!) > 7. What is meant by saying that SGML text passes through? SGML isn't > even a mark-up language, so I assume that the intent is something > like "XML and HTML text passes through." But does that mean that > in an expression like 'a*b*', the '*'s will be ignored? > That seems unreasonably difficult to implement. What about an > expression like ''? Does this mean I can't say things > like if 'xz'? Is there strong support for the > notion of letting "SGML" text pass through, or is it something that > might be dropped? (I would certainly vote for dropping it. :) ) In STClassic, there was an embedded assumption that HTML was being produced (since it was designed mainly for support of Zope web pages). Thus what was meant was that anything that looked vaguely HTML-like would be passed through untouched (so I think it just triggered on the < and >, but I wouldn't swear to it). In my code, and STpy, this assumption is thrown away - first of all because one cannot assume that HTML (or some other SGML child) is the target, and secondly because experience with LaTeX and other markup languages shows that allowing this sort of latitude to document writers is a Bad Thing. Of course, this does mean that one has to start to consider things like line breaks that STClassic would handle with an explicit '
' - but that can wait... > My eventual goal, to the extend that it's possible, is to write out a > complete formal specification for StructuredText using something > similar to BNF (Backus Naur Form). (I'm pretty sure that vanilla BNF > is not powerful enough to capture StructuredText.) That would be good. If it helps, I should be working on a DTD for STpy in the next week or so, which should relate, and will define some names for things... And, of course, given a BNF, presumably we can get someone else to write another parser (I am resolutely sticking with REs for docutils because (a) they're always present with Python and (b) they provide Unicode support, but a "proper" parser driven approach would be a Nice Tool to have out there - of course, given my druthers, I'd be using mxTextTools, which is a Very Nice Tool, but that's *definitely* not going out in the standard Python package!) > After I've done > that, I'll start working on getting Emacs to colorize StructuredText > strings. Yeh! Now that would be a truly good thing to do. But not as important as: > I'd also like to create a sort of test-suite set of strings > to test how different implementations function on different > "ambiguously defined" cases.. Oh, yes please - that's even more useful - stress cases are quite difficult to come up with. Can I suggest searching through the docutils sources for occurrences of the string "hmm" (case insensitive), since that will show up some of my own comments on oddities of the form? (Also big on my list of future needs is a "howto" document that has a section on heffalump traps - things that are difficult to do in ST, and wayward things that produce unexpected results. Your list of test cases would also be a Big Help in such a thing.) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From dgoodger@atsautomation.com Tue Mar 6 19:39:40 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Tue, 6 Mar 2001 14:39:40 -0500 Subject: [Doc-SIG] Structured Text Message-ID: Hi Edward, As Tibbs mentioned (thanks for the plugs, Tony! ;-), I wrote some articles in November on the very subject you're researching. You can find the articles here: - A Plan for Structured Text http://mail.python.org/pipermail/doc-sig/2000-November/001239.html - Problems With StructuredText http://mail.python.org/pipermail/doc-sig/2000-November/001240.html - reStructuredText: Revised Structured Text Specification http://mail.python.org/pipermail/doc-sig/2000-November/001241.html Unfortunately, since then I haven't had much time to work on the issues. I have refined some of the issues in my head, but not on "paper". Right now my computer at home is still in a box a week after we moved, six weeks after we began a major renovation project in our new house. The renovations continue, and take priority, so the computer will remain boxed for a few more days at least. I am writing from work, therefore I must be brief. > I'm not sure if this is the correct > forum for such questions. I know of no better forum. Another place might be the StructuredTextNG ZWiki or mailing lists over on Zope's pages. In addition, I'd highly recommend scanning over the archives of Doc-SIG, since these issues have come up many times in the past. Most of the principles have given up and moved on to less controversial pastures. > 1. Does every string value have an interpretation as a Structured > Text? ... > 2. If it is true that every string value has an interpretation as a > Structued Text, does it make sense to officially "discourage" > certain types of strings [...] ? One way you can think of a StructuredText interpreter is like a computer language compiler/interpreter: illegal input generates warnings (if you're lucky :) and errors. I like Tibbs' reply to this one. Ideally, a tool would make a best guess and generate warnings, without crapping out (unless explicitly told to do so). > 3. Which types of "code coloring" (emph, inline, etc.) can "wrap" over > lines, and which can't? E.g., can I have an *emph statement that > continues to the next line?* I don't see why not. > 4. Is there any official precedance ordering on the different > types of > "code coloring?" Will there be anytime soon? Any rules > about what > types of code coloring can be contained in what other types? Nothing official other than the existing codebase, which is not complete. See my block diagram in "reStructuredText: Revised Structured Text Specification" for my take on the (high-level) hierarchy. > 5. Does structural formatting or code coloring take precedance? For > example, if a paragraph starts with "* foo *," will it be a normal > paragraph with an emphasized first element, or a list > item? (It'll > be much easier for me to write formal rules if structure takes > precedence. ;) ) I think structure has to take precedence. You've provided one of an infinite number of conceivable edge-cases where it'll be tricky to get a program to process its input correctly 100% of the time. > 6. Among the list types, which take precedence? For example, if a > paragraph starts with "1. foo -- bar", is it an ordered list item > or a descriptive list item? I would consider it an ordered list item *containing* a descriptive list item. But I'm certain others would differ. > 7. What is meant by saying that SGML text passes through? SGML isn't > even a mark-up language, so I assume that the intent is something > like "XML and HTML text passes through." But does that mean that > in an expression like 'a*b*', the '*'s will be ignored? > That seems unreasonably difficult to implement. What about an > expression like ''? Does this mean I can't say things > like if 'xz'? Is there strong support for the > notion of letting "SGML" text pass through, or is it something that > might be dropped? (I would certainly vote for dropping it. :) ) SGML *is* a markup language (that's what the ML stands for), but it's a meta-markup language. XML is too. Only HTML (of the three) is a specific markup language. What they meant was simply that "text like this" would pass through. This is one place where the original StructuredText definition is sorely lacking, IMHO. Hopefully I can get the renovations done before too long and get back to more cerebral pursuits. David Goodger Systems Administrator & Programmer, Advanced Systems Automation Tooling Systems Inc., Automation Systems Division direct: (519) 653-4483 ext. 7121 fax: (519) 650-6695 e-mail: dgoodger@atsautomation.com From dgoodger@atsautomation.com Tue Mar 6 20:11:45 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Tue, 6 Mar 2001 15:11:45 -0500 Subject: [Doc-SIG] Structured Text Message-ID: > here's an awkward case:: > > 1. This is a list item:: > > This is some literal text (see the '::' above) > > I can't make this paragraph a "child" of the list item. > > My intent is to make the third para a child of the first (so it won't > terminate the list). But I can't do that My approach is to think about it two-dimensionally, in X-Y space: 1. This is a list item:: This is some literal text (see the '::' above) I can't make this paragraph a "child" of the list item. The "I" lines up with the "T" in the first line. A block diagram helps: +----+------------------------------------+ | 1. | (list item) | +----| +----------------------+ | | | (paragraph) | | | | This is a list item: | | | +----------------------+ | | +------------------------------+ | | | (code block) | | | | This is some literal text... | | | +------------------------------+ | | +----------------------+ | | | (paragraph) | | | | I can't make this... | | | +----------------------+ | +------------------------------------+ The indentation of the code block is just for emphasis (unless you want paragraphs to contain body elements, a subject for intense debate ;-). I once wrote a (huge, ugly, Perl) program which parsed syntax diagrams drawn using ASCII with line-drawing extensions (similar to the diagram above), and translated them into valid SGML. StructuredText is similar: it's ordinary text with a horizontal dimension. Basically, it's graphical text. Like a bitmap, but characters instead of pixels. Thinking in terms of dissection into blocks is very useful. HTH, /DG P.S. I haven't disappeared; just incredibly busy. I'm still lurking on the list. From dgoodger@atsautomation.com Tue Mar 6 21:58:19 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Tue, 6 Mar 2001 16:58:19 -0500 Subject: [Doc-SIG] and a silent 'Q', the famous Dutch author Message-ID: I misspoke, > As Tibbs mentioned [...] Sorry, that's 'Tibs' with one 'b' not two. My apologies. From tony@lsl.co.uk Wed Mar 7 10:22:42 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 7 Mar 2001 10:22:42 -0000 Subject: [Doc-SIG] Structured Text In-Reply-To: Message-ID: <002901c0a6f0$8b4a2d30$f05aa8c0@lslp7o.int.lsl.co.uk> Elsewhere David apologised for spelling my name Tibbs rather than Tibs - that's OK, I'm used to it, given my surname (the reason it has a single "b" is only that the original cat tag had "Tibs" and not "Tibbs"). David Goodger also wrested (wrost?) time from house moving recovery to say: > My approach is to think about it two-dimensionally, in X-Y space: > > 1. This is a list item:: > > This is some literal text (see the '::' above) > > I can't make this paragraph a "child" of the list item. > > The "I" lines up with the "T" in the first line. A block > diagram helps: > > +----+------------------------------------+ > | 1. | (list item) | > +----| +----------------------+ | > | | (paragraph) | | > | | This is a list item: | | > | +----------------------+ | > | +------------------------------+ | > | | (code block) | | > | | This is some literal text... | | > | +------------------------------+ | > | +----------------------+ | > | | (paragraph) | | > | | I can't make this... | | > | +----------------------+ | > +------------------------------------+ > > The indentation of the code block is just for emphasis > (unless you want paragraphs to contain body elements, > a subject for intense debate ;-). Hmm. I think of the indentation as Python-esque, and just "pretend" I'm blind to anything but the first line (for non-literal paragraphs, anyway). I'm afraid I've been exposed to document tree structures for too long to want to think of it as block diagrams. Anyway, I have very strong views on how correct markup should work (heh, I have a TeX background, and I'm a pedant). HTML contravenes some of them (sections in list items! - hah! "some" people may consider one shouldn't have an H4 immediately following an H2 - hah!), and StructuredText a whole different set. But that's *my* problem - ST is also intensely pragmatic, and cleverly designed so that many of the things people initially worry about never actually happen in practice (given a close reading of the spec, even the "informal" STClassic spec). Within the limitations of what one can do with a "nearly plain text" approach that *is* trying to mix markup and presentation, it does amazingly well (and I'm currently having a real problem with remembering to type ... when I'm writing HTML, instead of *...*(!)). Unfortunately, of course, the block diagram is wrong, for all forms of current ST - the correct diagram is:: +-------------------------+ | (list item) | | 1. This is a list item: | +-------------------------+ +------------------------------+ | (literal block) | | This is some literal text... | +------------------------------+ +----------------------+ | (paragraph) | | I can't make this... | +----------------------+ since indentation starts at the start of a paragraph, which is calculated *before* the list sequence number is removed. What I would call the "traditional" branch of ST (STClassic, STNG - the Zope strand) aims to separate paragraph generation from anything else - it aggressively regards paragraphs as separated by blank lines, and the document structure is built up only using paragraphs generated by such a method (this, I think, is even more so in STNG, where they are trying for a very "clean" structure of classes, with each phase of parsing separated strongly from each other phase - the aim being to allow subclassing for customisation.) Anyone reading *my* code will see I have abandoned such an approach - basically because I wanted to allow list items to start paragraphs (so they need to be detected early), and (later on) because if one is going to handle literal "paragraphs" properly, one needs to handle that specially as well - a simple "detect paragraphs and then markup" won't do it, literalising of paragraphs needs to be an intermediate stage. Basically, in my view, processing ST-style texts well *requires* a hybrid approach - I would assert that the results provided by a more "theoretical" (in some sense) approach are not as satisfying. This is probably, of course, a "religious" disagreement, and I will be interested to see what Jim Fulton's people at Zope manage to do with STNG (I have a feeling that they didn't throw away enough code before starting the project, but that's a comment from fairly strong ignorance). Back to the example. As I was saying, in both branches of ST development (NG and py), the *very start* of the paragraph determines its indentation. *Because* this is such a fundamental structuring decision, I would be very cautious about changing it (technically, in STpy/docutils, it wouldn't be too hard, off the top of my head, to do what you'd like). My experience, as I say further above, is that the basic ideas of ST are *very* solid in their pragmatic usefulness, and I would need to think about a lot of text examples before wanting to change something like that. Also, I am already worried a little about the incompatibilities between STpy and STNG (since STNG is happy to be somewhat incompatible with STClassic, I shall take the same stance on that). Luckily there aren't many (extra features such as '#...#' will ultimately be selectable anyway - this makes sense for processing .stx files that have nothing to do with Python) - I think the *main* one may well be the alllowance of list items starting new paragraphs, which was something STNG was also considering at one stage. (Of course, if the incompatibilities are few and abstruse, one *could* argue that STNG isn't taking such a wrong approach, after all, couldn't one!) > P.S. I haven't disappeared; just incredibly busy. I'm still > lurking on the list. And I still intend to comment on your documents at some stage. Tibs (hmm - the exclamation count in this email is too high) (there, I've removed some - perhaps I should insert some more parenthesised clauses to compensate...) -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Thu Mar 8 08:08:13 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 08 Mar 2001 03:08:13 EST Subject: [Doc-SIG] Formalizing StructuredText Message-ID: <200103080808.f2888Dp27950@gradient.cis.upenn.edu> I've started working on a project to formalize StructuredText (in particular, STNG and STpy). I am using a slightly extended version of EBNF to write the formal descriptions. See the project proposal for more information: http://www.cis.upenn.edu/~edloper/pydoc/stminus.html A preliminary formalization is available at: http://www.cis.upenn.edu/~edloper/pydoc/stminus-001.html I would appreciate any feedback. :) -Edward From tony@lsl.co.uk Thu Mar 8 10:37:16 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 8 Mar 2001 10:37:16 -0000 Subject: [Doc-SIG] Formalizing StructuredText (yeh!) In-Reply-To: <200103080808.f2888Dp27950@gradient.cis.upenn.edu> Message-ID: <003b01c0a7bb$be8712a0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > I've started working on a project to formalize StructuredText (in > particular, STNG and STpy). I am using a slightly extended > version of EBNF to write the formal descriptions. See the project > proposal for more information: > http://www.cis.upenn.edu/~edloper/pydoc/stminus.html > > A preliminary formalization is available at: > http://www.cis.upenn.edu/~edloper/pydoc/stminus-001.html > > I would appreciate any feedback. :) Well, first off, I'm mega-impressed - I hadn't dreamed that someone would actually get round to doing something like this. This is truly neat. Secondly, I'll add a link from the next "status" document for docutils - do you prefer to be "Edward Loper" or "Edward D. Loper"? Thirdly, could I ask you to look at: http://www.zope.org/Members/jim/StructuredTextWiki/DocumentationStrings - I've added a reference to STminus at the end of the page. It might be an idea to get a Zope Wiki account (if you don't already have one) and add a reference somewhere further up the hierarchy as well - I don't know if Jim Fulton and co. have the time or inclination to watch the Doc-SIG... Fourthly, I haven't had time to read the whole STminus document (heh, I only just saw your email!), but I did note the large red box halfway down the actual STminus definition page. Just above it you say:: Note that the empty literal ('') is a valid literal. I have a sneaky feeling that this is not so in the current version of STpy (it *may* have been earlier - it's something I'm ambivalent about, since I can't see much *use* for an empty literal string). I'll have a look at what the current REs do (although I broke them yesterday whilst "tidying up", so that's not terribly reliable). Of course, if STNG allows an empty literal, that would be a case for STpy doing so as well (but what about an empty ##? Hmm. I'll think about this whole thing in more detail.) Oh - one last point - please, please, please put

at the start of paragraphs - HTML really does mandate it, despite the fact that IE seems not to care (I've come across browsers in the past that *did* treat the absence of

as meaning pure whitespace, causing all "paragraphs" to run together...). Personally, I recommend HTML Tidy as a tool for checking/reformatting HTML - not that I always do what it *says*, but at least I then know when I'm being naughty... Damn, now I'm going to have to learn EBNF. Hard when I've got my stupid-hat on - it took me quite a while to realise why "'" was called APOS... I do like the assumption of a dialogue between STNG and STpy - although at the moment you're it! (mind you, that's a damn good start, so far as I'm concerned). I'll try to go through the text and look for "intentional differences" that I know about. (oh - the link [5] at the bottom of the intro page, to http://www.cis.upenn.edu/~edloper/pydoc/ebnfla_proof.html gives a Not Found error) All the best, Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Thu Mar 8 20:12:58 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 08 Mar 2001 15:12:58 EST Subject: [Doc-SIG] Formalizing StructuredText (yeh!) In-Reply-To: Your message of "Thu, 08 Mar 2001 10:37:16 GMT." <003b01c0a7bb$be8712a0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103082012.f28KCwp15503@gradient.cis.upenn.edu> Tibs said: > Secondly, I'll add a link from the next "status" document for docutils - > do you prefer to be "Edward Loper" or "Edward D. Loper"? Thanks. Let's call me Edward Loper for now. I also have a page at: http://www.cis.upenn.edu/~edloper/pydoc that will contain pointers to all my essays/work on python documentation and StructuredText. > - I've added a reference to STminus at the end of the page. It might be > an idea to get a Zope Wiki account (if you don't already have one) and > add a reference somewhere further up the hierarchy as well - I don't > know if Jim Fulton and co. have the time or inclination to watch the > Doc-SIG... I'll try to get an account sometime. I've never used Zope before, but it doesn't look that difficult to find out. Is there someone I need to talk to to get an account, or do I just register on some web page somewhere? If Jim (and co.) don't read Doc-SIG, should I send email to someone at STNG telling them what I'm working on? If so, who? > Fourthly, I haven't had time to read the whole STminus document (heh, I > only just saw your email!), but I did note the large red box halfway > down the actual STminus definition page. Just above it you say:: > > Note that the empty literal ('') is a valid literal. > > I have a sneaky feeling that this is not so in the current version of > STpy (it *may* have been earlier - it's something I'm ambivalent about, > since I can't see much *use* for an empty literal string). I have a suscicion that STminus's current definition does *not* actually provide a subset of the intersection of STNG and STpy.. This should hopefully become more clear once I make lots of test cases, and can run them through the parsers for all three languages (although, just because one parses something one way, doesn't mean that that's the intended behavior, but..) > Oh - one last point - please, please, please put

at the start of > paragraphs - HTML really does mandate it, despite the fact that IE seems > not to care (I've come across browsers in the past that *did* treat the > absence of

as meaning pure whitespace, causing all "paragraphs" to > run together...). Personally, I recommend HTML Tidy as a tool for > checking/reformatting HTML - not that I always do what it *says*, but at > least I then know when I'm being naughty... Fixed. Sorry about that; over time, I've come to respect HTML so little that I don't bother to use tags properly sometimes. :) > Damn, now I'm going to have to learn EBNF. Hard when I've got my > stupid-hat on - it took me quite a while to realise why "'" was called > APOS... I added text descriptions for all the terminal productions. :) In my references, I list the following page on EBNF: http://www.augustana.ab.ca/~mohrj/courses/2000.fall/csc370/lecture_notes/ebnf.h tml However, I'm *sure* that there are better pages out there that describe EBNF; if someone knows of a good one, tell me, and I'll add it to my references. > I do like the assumption of a dialogue between STNG and STpy - although > at the moment you're it! (mind you, that's a damn good start, so far as > I'm concerned). I'm hoping the people working on STNG will like the idea too. :) > I'll try to go through the text and look for "intentional differences" > that I know about. Thanks. > (oh - the link [5] at the bottom of the intro page, to > http://www.cis.upenn.edu/~edloper/pydoc/ebnfla_proof.html > gives a Not Found error) Oops, forgot to copy that page over. Fixed now. -Edward From klm@digicool.com Fri Mar 9 00:40:39 2001 From: klm@digicool.com (Ken Manheimer) Date: Thu, 8 Mar 2001 19:40:39 -0500 (EST) Subject: [Doc-SIG] Re: Doc-SIG digest, Vol 1 #271 - 2 msgs In-Reply-To: Message-ID: > To: doc-sig@python.org > Date: Thu, 08 Mar 2001 03:08:13 EST > From: "Edward D. Loper" > Subject: [Doc-SIG] Formalizing StructuredText > > I've started working on a project to formalize StructuredText (in > particular, STNG and STpy). I am using a slightly extended > version of EBNF to write the formal descriptions. See the project > proposal for more information: > http://www.cis.upenn.edu/~edloper/pydoc/stminus.html > > A preliminary formalization is available at: > http://www.cis.upenn.edu/~edloper/pydoc/stminus-001.html > > I would appreciate any feedback. :) Most excellent!! There's been some recent dialogue within digital creations pro and con ST, and i think one indisputable objection was ST's lack of a clear specification - this is just what the doctor ordered! Not only does it address the problem - it is also very encouraging to see energy being invested from outside the company, in this and tony's efforts. (I believe ST has a nice, solid place in our ongoing plans, by the way. Zope will probably go to storing textish documents in neutral, DOMish data structures, with the option to present as ST, among other options, for editing, etc.) > - I've added a reference to STminus at the end of the page. It might be > an idea to get a Zope Wiki account (if you don't already have one) and > add a reference somewhere further up the hierarchy as well - I don't > know if Jim Fulton and co. have the time or inclination to watch the > Doc-SIG... Some of us do in a sketchy fashion. (I subscribe via digest, and sometimes miss issues.) But tony, those pages are in a wiki precisely to enable the kind of thing you're suggesting! Bravo! Ken klm@digicool.com From tony@lsl.co.uk Fri Mar 9 10:44:32 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 9 Mar 2001 10:44:32 -0000 Subject: [Doc-SIG] Re: Doc-SIG digest, Vol 1 #271 - 2 msgs In-Reply-To: Message-ID: <004501c0a885$ecee97b0$f05aa8c0@lslp7o.int.lsl.co.uk> Ken Manheimer wrote: > (I believe ST has a nice, solid place in our ongoing plans, > by the way. Zope will probably go to storing textish documents > in neutral, DOMish data structures, with the option to present > as ST, among other options, for editing, etc.) Of course, STNG and STpy both produce DOM trees (more or less) - so there is always the option of writing ST outputters, and translating via DOM tree manipulations (excuse me - I'll go off in a corner and gibber quietly now). More seriously, I would be *very* interested to see more feedback (somewhere - to Edward Loper seems an obvious route) between the different ST developers, especially with explanations of *why* particular choices are made (accident, of course, being a useful one!) > But tony, those pages are in a wiki precisely to > enable the kind of thing you're suggesting! Bravo! The problem with the Wiki is that it requires a significant investment of "mind space" to use it - both for monitoring it, and for working out what one should to to amend it so that the results one want become more likely. It's a heck of a lot easier to dump stuff onto a mailing list... (of course, that *may* be a reason for using a Wiki!) Anyway, it's good to see the Doc-SIG with new participants, and all this new terminology is fun too (and STminus may yet be one of the more significant things the Doc-SIG has done). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From klm@digicool.com Fri Mar 9 17:36:31 2001 From: klm@digicool.com (Ken Manheimer) Date: Fri, 9 Mar 2001 12:36:31 -0500 (EST) Subject: [Doc-SIG] Re: Doc-SIG digest, Vol 1 #271 - 2 msgs In-Reply-To: <004501c0a885$ecee97b0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: On Fri, 9 Mar 2001, Tony J Ibbs (Tibs) wrote: > Ken Manheimer wrote: > [...] > More seriously, I would be *very* interested to see more feedback > (somewhere - to Edward Loper seems an obvious route) between the > different ST developers, especially with explanations of *why* > particular choices are made (accident, of course, being a useful one!) We'll try to chime in! > > But tony, those pages are in a wiki precisely to > > enable the kind of thing you're suggesting! Bravo! > > The problem with the Wiki is that it requires a significant investment > of "mind space" to use it - both for monitoring it, and for working out > what one should to to amend it so that the results one want become more > likely. It's a heck of a lot easier to dump stuff onto a mailing list... I didn't mean to say the wiki was a *good* way to do it, but rather that it's in the wiki so you and others could introduce and change stuff. (In fact, i consider wiki woefully imperfect. WikiForNow was something i worked on as a stopgap, while the Zope CMF is growing. We're hoping there to develop some wiki-type benefits while remeding some wiki shortcomings. For instance, see http://cmf.zope.org/Members/klm/OrganizationObjects .) Ken From ping@lfw.org Fri Mar 9 23:08:21 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 9 Mar 2001 15:08:21 -0800 (PST) Subject: [Doc-SIG] Docstring markup process Message-ID: Paul Prescod made an excellent meta-proposal for docstrings at the recent conference: rather than arguing endlessly about various markup formats, anyone who wants to propose a particular markup format should write a PEP describing that format in detail. Only formats described in a PEP will be under serious consideration. By an agreed deadline, we can vote on the PEPs, and then be done with it. Of course we can discuss the various proposals here, but it's a big step forward to get them written up and all in one place for comparison. I strongly support this process; let's pick a deadline. -- ?!ng "If I have not seen as far as others, it is because giants were standing on my shoulders." -- Hal Abelson From ping@lfw.org Sun Mar 11 10:13:38 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 11 Mar 2001 02:13:38 -0800 (PST) Subject: [Doc-SIG] Evolution of library documentation Message-ID: [resent with individual cc addresses, since mail.python.org is down] Hi everyone! The introduction of pydoc places more emphasis on docstrings in the source code. I think this is generally good, since keeping the documentation close to the source makes it more likely to be kept up to date. However, it also produces the potential for duplication of effort in maintaining both the docstrings and the LaTeX file for the library reference. The LaTeX documentation seems to be motivated by the richer metadata, the greater control over formatting, and the ability to present a long tutorial or detailed explanation. At the Python conference, a small group of us discussed the possibility of merging the external and internal documentation; that is, moving the library reference into the module source files. It would no longer be written in TeX so that you wouldn't have to have TeX in order to produce documentation. This would address the duplication problem and also keep all of a module's documentation in one place together with the module. To avoid forcing you to page through a huge docstring before getting to the source code, we would allow a long docstring to go at the end of the file (or maybe collect docstrings from anywhere in the file). To implement this convention, we wouldn't need to change the core because the compiler already throws out string constants if they aren't used for anything. So a big docstring at the end of the file would not appear in the .pyc or occupy any memory on import; it would only be obtainable from the parse tree, and tools like pydoc could use the compiler module to do that. That leaves the metadata and formatting issues. When i suggested this idea (of merging in the external documentation) to Guido, he was initially against it. He was very concerned about the loss of information in the TeX markup. In order to even consider switching formats, he requires that we preserve as much metadata as possible from the TeX docs (so that, for example, we can still generate a useful index). But i still think that getting all the docs together in one place is a goal worth at least investigating. So i have gone through the TeX files in the Doc/lib directory and extracted a list of all the TeX markup tags that are used there. Here follows my list; i have attempted to categorize the purpose of the tags by hand. Fred, would you mind looking over this list to see if i have classified the meanings of the tags correctly? Each tag name appears with the number of times that it occurs as a measure of how important it is. This should give us a starting point for evaluating and discussing what kind of metadata and formatting control we have, what is worth preserving, and what we would need to consider supporting in a structured-text-style markup if we were to merge the documentation. After i've had a while to study the list, i will probably post my own annotated list of which ones i would support and which ones i would toss. I encourage you to look at it and do the same. -- ?!ng "If I have not seen as far as others, it is because giants were standing on my shoulders." -- Hal Abelson # ------------------------------------------------------------- BLOCK TAGS block formatting markup: abstract 1 description 28 displaymath 1 document 1 enumerate 6 flushleft 1 fulllineitems 1 itemize 35 list 2 seealso 73 sloppypar 3 verbatim 274 math 4 table formatting: longtableii 2 tableii 34 tableiii 24 tableiv 1 descriptive sections for Python objects: classdesc 132 datadesc 399 datadescni 29 excclassdesc 4 excdesc 124 funcdesc 1122 funcdescni 1 memberdesc 170 methoddesc 1152 methoddescni 4 opcodedesc 104 # ------------------------------------------------------------ INLINE TAGS special words, symbols, and math: ABC 3 ASCII 58 C 12 Cpp 2 EOF 19 Large 2 NULL 3 POSIX 30 UNIX 226 copyright 1 e 3 frac 1 ldots 2 sqrt 1 sum 1 inline formatting markup: cdata 11 cfunction 84 character 163 code 2485 ctype 40 dfn 63 email 3 emph 163 envvar 47 file 174 footnote 24 kbd 14 keyword 98 longprogramopt 7 manpage 23 mbox 1 mimetype 14 platform 44 program 65 programopt 9 regexp 63 rfc 43 samp 347 strong 85 textrm 9 url 21 var 4234 metadata fields: declaremodule 211 deprecated 15 moduleauthor 57 modulesynopsis 220 sectionauthor 114 versionadded 82 TeX processing macros: documentclass 1 input 226 label 242 nodename 20 renewcommand 3 table cells: lineii 386 lineiii 279 lineiv 15 tagging indexable words: bifuncindex 35 index 180 indexii 92 indexiii 17 indexiv 1 obindex 26 opindex 12 setindexsubitem 13 stindex 12 stmodindex 1 ttindex 50 withsubitem 28 cross-references: citetitle 11 ref 29 refbimodindex 31 refmodindex 2 refmodule 203 refstmodindex 60 seemodule 84 seepep 1 seerfc 9 seetext 12 seetitle 1 seeurl 3 Python identifiers: class 639 constant 348 dataline 67 exception 310 funcline 61 funclineni 1 function 954 member 159 memberline 2 method 866 module 635 pytype 1 optional 734 headings: chapter 25 section 237 subsection 227 subsubsection 44 title 1 From Edward Welbourne Sun Mar 11 12:28:50 2001 From: Edward Welbourne (Edward Welbourne) Date: Sun, 11 Mar 2001 12:28:50 +0000 (GMT) Subject: [Doc-SIG] Evolution of library documentation In-Reply-To: (message from Ka-Ping Yee on Sun, 11 Mar 2001 02:13:38 -0800 (PST)) References: Message-ID: > At the Python conference, a small group of us discussed the possibility > of merging the external and internal documentation; that is, moving > the library reference into the module source files. good. > the compiler already throws out string constants if they aren't > used for anything. cool. > Guido ... initially against it ... concerned about the loss of > information in the TeX markup. bah. OK, so it'll pressure ST* into being a bit richer ... big deal. > But i still think that getting all the docs together in one place is > a goal worth at least investigating. hey, understatement is meant to be a British thing - what're you doing invading our turf, Ka-Ping ;^? All programmers know: if the code and the docs disagree, mistrust both. Those involved in databases know: if data is duplicated, the copies get out of step with one another. Corollary: if the code and the docs aren't in the same place, you can't trust either. `A goal worth at least investigating' ? Try: A fundamental omission in most existing software management systems. I trust IPC9 went well, Eddy. -- Experienced software engineers know that perhaps 30% of the cost of a software product goes into specifying it, 10% into coding, and the remaining 60% on maintenance. -- Ross Anderson. From tony@lsl.co.uk Mon Mar 12 10:03:10 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 12 Mar 2001 10:03:10 -0000 Subject: [Doc-SIG] Docstring markup process In-Reply-To: Message-ID: <005d01c0aadb$a4c8b2b0$f05aa8c0@lslp7o.int.lsl.co.uk> > Paul Prescod made an excellent meta-proposal for docstrings at the > recent conference: rather than arguing endlessly about various > markup formats, anyone who wants to propose a particular markup format > should write a PEP describing that format in detail. Only formats > described in a PEP will be under serious consideration. By an agreed > deadline, we can vote on the PEPs, and then be done with it. Sounds OK by me - although I haven't *heard* any argument for a long time (was there argument at the conference? Why aren't they arguing here where I can hear them!). I rather fondly thought we were working on STNG with minor extensions, specifically '>>>' paragraphs and '#...#' markup, with other stuff to be decided later on when that was working (sounds like what a PEP could say!), but maybe I was mistaken. Of course, writing that down *formally* somewhere is not a bad idea... (it seemed somewhat Pythonic to me to work on the thing as one was de/refining it) with Edward Loper producing STminus to allow us to understand relationships and maybe be able to produce interoperability. > Of course we can discuss the various proposals here, but it's a big > step forward to get them written up and all in one place for > comparison. Do we *have* various proposals? I guess this is one way of finding out... > I strongly support this process; let's pick a deadline. OK, but please make it at least a month away or I'm unlikely to have time to write anything - are we at least allowed to have a quick stab at agreeing something here of thinking of what goes on, or are PEPs to be subimtted any old how? (reasons for asking is that I was fondly hoping to tidy up docutils a bit, rewriting docstrings where necessary, redo the STpy documentation somewhat, and alpha release within the next fortnight, thus making STpy the 'de-facto' standard for people to organise grumbles at. If peps are being written, then that has to go on hold, which is a pain - unless STpy documentation *becomes* a PEP.) Or is this related to Ka-Ping Yee's mega-documentation scheme, addressed elsewhere? Suggestion for meat-and-bones of PEP: 1. STNG plus '>>>' plus '#...#', maybe plus "tagged paragraphs" - basically what docutils supports now ('cos I *know* it hangs together coherently - working out what ST variants work is an ad-hoc business) 2. Future enhancements to include: - these are also discussed in the STpy document, but need deciding which ones the community wants. The most important is references within a document, and the next most important references to a Python object. 3. Whether ST gets vastly expanded to meet Ka-Ping Yee's latest proposal (discussed in another email). I'm certainly intending to try to produce (1), I guess... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 12 10:44:14 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 12 Mar 2001 10:44:14 -0000 Subject: [Doc-SIG] Evolution of library documentation In-Reply-To: Message-ID: <005e01c0aae1$61c8d890$f05aa8c0@lslp7o.int.lsl.co.uk> Ka-Ping Yee wrote: > [resent with individual cc addresses, since mail.python.org is down] Is it? OK - everyone's going to get two copies... > The introduction of pydoc places more emphasis on docstrings in the > source code. I think this is generally good, since keeping the > documentation close to the source makes it more likely to be kept > up to date. Agreed in so far as it goes. > However, it also produces the potential for duplication > of effort in maintaining both the docstrings and the LaTeX file for > the library reference. Hmm. I've had this argument before. Maintaining two different things is maintaining two different things. If the docstrings are sufficient (and we now have tools to extract them and format them), then well and good. But if they are not, then a different sort of document is just that - different. > The LaTeX documentation seems to be motivated by the richer metadata, > the greater control over formatting, and the ability to present a > long tutorial or detailed explanation. Yes. Although I'm not worried personally if it's LaTeX or (for instance) DocBook XML. > At the Python conference, a small group of us Ah, the Spanish Inquisition. Which is why I didn't expect it (sorry - not *really* getting at people - well, maybe just a little) > discussed the possibility of merging the external and internal > documentation; that is, moving the library reference into the > module source files. Hmm. I'll rant about this a little later on. > It would no longer be written in TeX so that you wouldn't have > to have TeX in order to produce documentation. Not *necessarily* a bad goal (although I would point out it's *significantly* easier to "have TeX" than, for instance, to "have CVS", which one is also required to have to do development with modern Python (a *serious* problem for some of us). > This would address the duplication problem and > also keep all of a module's documentation in one place together with > the module. Now, if you said "package" I'd be happy, but since it's "module", I'll gripe. > To avoid forcing you to page through a huge docstring > before getting to the source code, we would allow a long docstring to > go at the end of the file (or maybe collect docstrings from anywhere > in the file). Aagh! No, sorry, my problem wouldn't be with paging (although that *is* a problem - and why is the end of the file so different than the front? - I page from both ends, depending on context!). Source files are for source code. I want to be able to *treat* them as such. It is quite possible for a two page source to have ten or more pages of documentation associated with it. That does *not* belong in the same *file* as the source - if someone *wants* to associate them closely, the correct way to do it is with a *package*. Let's see if I can explain this a bit better. Files are a useful way of organising data, but good practice doesn't stuff things into one file when they are better organised as two or more. That's why we split source code up into multiple files - a good language like Python allows and encourages this, so that even if one only has one entry point into a package, the writer can still choose to split it up logically into multiple "internal" files. Keeping file size down also has advantages - it makes it easier to navigate the file both "physically" (with an editor) and "conceptually" (remembering what is in the file and why). It's related to the "don't let functions/methods get too big" idea. Files are also, in many filesystems, *typed*. That is, the file "name" has an indication of what is *in* the file. Using this information can be a big win. Docstrings are for inserting "point" documentation, targeted documentation that relates to the particular object the docstring is attached to. This is a Good Thing, and one of the most important additions to Python over the last few years. The key idea here is that targeting - the documentation is in the docstring (and thus in the file) because it belongs *with* what it is documenting. Tutorial, reference and other "grander scope" documentation relates to the source code as a whole. So the "object" it belongs to is the module or package (or perhaps part of it). As such, one *might* argue that, for a single module package, it belongs in that module's docstring. But one then has to decide which of the sorts of documentation "belongs" there, since there is only one slot. I argue for it being whatever the module writer wants (!), but normally/notionally an overview to allow a source code reader (or person browsing with an IDE) to get a handle on what is going on. (That's an important point - docstrings must be suitable for browsing with an IDE.) *Because* "grander scope" sorts of documentation relate to the package or module as a whole, I think they deserve a separate file. OK, so if you want it closely coupled, that makes more things packages. Tough. A package can "look like" a module if it wants. Also, *because* one might have more than one sort of "grander scope" documentation for a module/package, you will have to consider *supporting* more than one. Difficult if it is "just" a string tacked on the end. > That leaves the metadata and formatting issues. When i suggested this > idea (of merging in the external documentation) to Guido, he was > initially against it. He was very concerned about the loss > of information > in the TeX markup. In order to even consider switching formats, he > requires that we preserve as much metadata as possible from the TeX > docs (so that, for example, we can still generate a useful index). I agree with Guido (gosh!) on this. My reasons are based on long term use of documentation tools, and also on good programming practice, as well as gut instinct (which is, of course, also based on those things!). The reason for adopting ST (or some variant) for markup in docstrings is, basically, because it is acknowledged that many people will not create docstrings with more markup than that, or with more obtrusive markup than that. I'll say that again slower - there are two reasons for ST (or similar) in docstrings: 1. People *will not* markup heavily (we cannot make them do it, they will not do it), so we need to specify a markup that doesn't have a high learning curve, and that doesn't have many *ways* of marking up 2. People will not use an "obtrusive" markup, like TeX, XML or HTML, because they perceive it as "difficult to read". Those two are, of course, different faces of the same thing. Now, *if* we are to retain all of the markup meaning that the TeX documentation has, we will *have* to have more complex markup. ST is predicated on the idea that it is very simple to read (it is not accidental that it looks very much like email). STpy is already straining at that a bit by introducing '#...#' (which we think we need). And I am not convinced that there is an ST-natural way of quoting a single quote as a literal character (which is the sort of thing one *has* to be able to do for proper markup of a detailed text on some issues). Thus, despite the ability to write a book (the Zope book, for instance) in ST, it is required to stay not much more complex than it is, or people won't use it. Worse, if one tries to continue using "simple" markup in ST, one is going to end up with strained analogies, and with almost any non-alphanumeric character having a special meaning. Yuck (can we say Perl?). The obvious way round that is to start doing, well, markup - for instance, '@class(..)' or somesuch (like Pod, I think? - or GNU texinfo). In which case we're inventing our own little markup language again, with none of the reasons for doing it that went into ST. And I for one reckon that we probably don't have a Guido-of-the-markup-languages hanging around on our list (it's statistically unlikely). Indeed, if one *needs* markup, then the obvious thing to do is to steal someone else's (I for one don't care much *which* markup language one steals - TeX, Pod, texinfo and DocBook XML all have their advantages and disadvantages. I thought we'd delegated that decision to Fred Drake). > But i still think that getting all the docs together in one place is > a goal worth at least investigating. Depends on how tightly couple "one place" is - the same *directory*, maybe. The same *file* - naff idea. > So i have gone through the TeX > files in the Doc/lib directory and extracted a list of all the TeX > markup tags that are used there. Here follows my list; i > have attempted > to categorize the purpose of the tags by hand. At which point I think I rest my case - there are *lots* of these. I sincerely hope that we don't adopt this proposal as stated. I *wouldn't* object to a proposal that said that documentation source files should (maybe) live with source source files (although people who don't want to download the documentation might well object!). And I am open to the formatting language that is used for such "grander scope" documentation, although I think we should not be trying to invent our own (I suspect that a DocBook XML variant is probably what we want, since it is a skill that seems to have application elsewhere). I've *got* to go and do paid work now. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From ping@lfw.org Mon Mar 12 11:30:42 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 12 Mar 2001 03:30:42 -0800 (PST) Subject: [Doc-SIG] Evolution of library documentation In-Reply-To: <005e01c0aae1$61c8d890$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: On Mon, 12 Mar 2001, Tony J Ibbs (Tibs) wrote: > > Hmm. I've had this argument before. Okay. Well, i think it's good to have this particular debate. It's worth discussing, so please bear with me as i argue it out with you. > Not *necessarily* a bad goal (although I would point out it's > *significantly* easier to "have TeX" than, for instance, to "have CVS", This really surprised me. CVS is installed by default, i believe, on all modern Linux distributions, and i have yet to install TeX, which is much bigger and more complex. How is it that you perceive the opposite? > > This would address the duplication problem and > > also keep all of a module's documentation in one place together with > > the module. > > Now, if you said "package" I'd be happy, but since it's "module", I'll > gripe. But the library reference manual is arranged by module, and there is a chapter of documentation on each individual module. It also makes sense since the modules are the organizational units that you import and name in your code. > Aagh! No, sorry, my problem wouldn't be with paging (although that *is* > a problem - and why is the end of the file so different than the > front? - I page from both ends, depending on context!). I do think the inconvenience is mitigated by putting the docs at the end -- but i acknowledge that having bigger files is a concern. I don't see this as a 100% win myself -- it just seems that keeping the code and docs in the same file has advantages large enough to outweigh the inconvenience. > Tutorial, reference and other "grander scope" documentation relates to > the source code as a whole. Can you delineate clearly what you consider "grander scope" documentation as opposed to "point" documenation on a particular module? I'd like to better understand what you mean by "different" in the sense of different enough that something should be in a separate file. > (That's an important point - docstrings must be suitable > for browsing with an IDE.) Agreed. > Also, *because* one might have more than one sort of "grander scope" > documentation for a module/package, you will have to consider > *supporting* more than one. Could you give an example? > And I for one reckon that we probably don't have a > Guido-of-the-markup-languages hanging around on our list (it's > statistically unlikely). By Guido-of-the-markup-languages did you mean "benevolent dictator" or "good designer" or "long-term keeper of the faith" or something else? > 1. People *will not* markup heavily (we cannot make them do it, they > will not do it), so we need to specify a markup that doesn't have a high > learning curve, and that doesn't have many *ways* of marking up [...] > > markup tags that are used there. Here follows my list; i > > have attempted to categorize the purpose of the tags by hand. > > At which point I think I rest my case - there are *lots* of these. Although it may seem surprising, i don't immediately conclude that there are so many tags that we can't possibly design a reasonably useful markup syntax. Many of the tags are redundant or produce shades of meaning finer than i consider really necessary. I'm not claiming it's possible until i really give it a try, but i do think it's worth a serious attempt. -- ?!ng "Computers are useless. They can only give you answers." -- Pablo Picasso From tony@lsl.co.uk Mon Mar 12 13:35:23 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 12 Mar 2001 13:35:23 -0000 Subject: [Doc-SIG] Evolution of library documentation In-Reply-To: Message-ID: <006201c0aaf9$4a59c670$f05aa8c0@lslp7o.int.lsl.co.uk> (do we still need to spam everyone individually? hmm - just once more) Ka-Ping Yee wrote, in response to my earlier missage (I'm thus the bit double-chevroned): > > Hmm. I've had this argument before. > > Okay. Well, i think it's good to have this particular debate. It's > worth discussing, so please bear with me as i argue it out with you. No, debates are good - in particular, in the context of Doc-SIG, I've changed my mind at least once because of debate (and significantly so, since I was initially vehemently against ST). And I understand that you're trying to improve things (I just worry that it will turn into improoooovement) - if I get too argumentative below, please understand that I tend to talk too loud when excited (and documentation issues have been getting me worried/excited for well-on 20 years now, I'm afraid). > > Not *necessarily* a bad goal (although I would point out it's > > *significantly* easier to "have TeX" than, for instance, to > > "have CVS", > > This really surprised me. CVS is installed by default, i believe, > on all modern Linux distributions, and i have yet to install TeX, > which is much bigger and more complex. How is it that you perceive > the opposite? Well. CVS is not present by default on Windows, and it is not present at all where I work (from where I type). We use RCS (with in-house jacketing - don't ask Eddie!) for Unix software, and a Microsoft thingy for NT work, at the moment. Moreover, I can't just install new software on a machine 'cos I want to. At home, whilst I have Linux, I have a crappy modem connection (so CVS is a poor choice 'cos it assumes good connectivity), and also when I briefly tried installing CVS (my Debian setup did *not* initially have it installed, 'cos I didn't ask for it), I cocked up what I asked it to do, I *think*. Regardless, I don't have the *time* to learn CVS, or the time to install it at home - those hours could be spent doing other things (and hours are *very* precious - yes, I know I keep harping on about that, sorry, but it's true). TeX, now. You download the package (on Linux you can probably omit this step, he says, throwing back the comment about what is already present on "modern" distributions). You tell it to install itself. It uses up lots of space on your system. You run the appropriate thingy on the files. You use dvi2ps or whatever and presto bongo. As I understand it, nowadays *installing* TeX and friends on a PC is a doddle, and it's pretty easy on Unix - one probably doesn't even need to compile stuff. And *using* TeX and friends is fairly easy too. *Understanding* what is going on when something goes wrong may be another matter, but I've already been there and done that, so the learning curve is pretty flat. So basically we've got a "my package is harder to install than your package" argument here, and I expect we'll both lose to someone using something less popular. The advantage TeX and friends have is that they're designed (nowadays) to be installed relatively easily by people who are not system admin types, and don't want to be. > > > This would address the duplication problem and > > > also keep all of a module's documentation in one place > > > together with the module. > > > > Now, if you said "package" I'd be happy, but since it's > > "module", I'll gripe. > > But the library reference manual is arranged by module, and there > is a chapter of documentation on each individual module. It also > makes sense since the modules are the organizational units that you > import and name in your code. Accident of history, that, surely? We didn't used to have packages, so all of the existing documentation more or less had to be by the module. Now that packages are around, that constraint is no longer true, and indeed we begin to get documentation for the XML package and so on (and if there *isn't* grand scope documentation for these, then that's the fault of normal lack-of-volunteer-itis, surely?). Hmm. "the duplication problem". Eddie notwithstanding, I'm not convinced it always *is* a problem. I don't always *want* my documentation to reflect truth-in-implementation - sometimes the documentation is deliberately behind (or even ahead) of the code. > > Aagh! No, sorry, my problem wouldn't be with paging > > (although that *is* a problem ... > > I do think the inconvenience is mitigated by putting the docs at the > end -- but i acknowledge that having bigger files is a concern. > I don't see this as a 100% win myself -- it just seems that keeping > the code and docs in the same file has advantages large enough to > outweigh the inconvenience. I suspect that we have two major differences, and this is one. I believe that putting the "full" documentation at the end of the file is bad, and not a win - I don't believe that this "same file" idea is either a win, or even a Good Idea, particularly. But I said that. Philosophically, I *want* to be able to point to a different file and say "that's documentation". (hmm - on the type-SIG some while back all sorts of people seemed happy with a separate interface file (mind you, *there* I disagreed, for the same reason I wouldn't like to put the *docstrings* in a separate file).) Incidentally, for a *package*, where do you stand? Are you more willing for a separate file there, or do you want one file to be magically decided on as "special" and to have the documentation therein? > > Tutorial, reference and other "grander scope" documentation > > relates to the source code as a whole. > > Can you delineate clearly what you consider "grander scope" > documentation as opposed to "point" documenation on a particular > module? I'd like to better understand what you mean by "different" > in the sense of different enough that something should be in a > separate file. Well, at the moment we have the language reference manual, the tutorial and the library reference manual. Oh, and HOW-TOs. All overlap each other (heh, so they should share common source!!! - erm, no). I would hope that in future, for a package we might have the following: * docstrings (at least) - this serves source code readers/IDEs, and *may* provide input for other things in the absence of anything else. But people like me are going to write stuff in docstrings you don't *want* in other documents. Regardless. * for a "standard" module/package, its entry in the library manual * for a non-standard module/package, equivalent (one hopes!) * for many packages (and some modules), a HOW-TO document - regular expressions are an example here, where AMK has written such. * for some packages, a tutorial document, perhaps a subsection *for* the tutorial The docstrings I termed "point" documentation because each docstring refers to a particular point in the source code - a particular object or whatever (heck, pydoc uses this to find them!). The "grander scope" documentation is anything that looks at a package as a whole. Such things should be written in a different mood, and quite possibly (if one can) by different people (as the HOW-TOs are frequently written by someone who didn't write the code). I mean, would you want *me* to write tutorial user documentation for STpy? Wouldn't it tend to be a bit too long? Anyway, back to the point. If a module *does* have all of those, which one do you choose to put at the back of the source file? And if it only has one, what happens when it gains another? > > Also, *because* one might have more than one sort of "grander scope" > > documentation for a module/package, you will have to consider > > *supporting* more than one. > > Could you give an example? Hmm. distutils is one. docutils/STpy/pydoc/whatever will be another (surely they deserve integrated documentation for *some* purposes, and docutils alone already has at least two documentation files, close on three, all for different purposes). HOW-TOs are another, as a class (and *they* sometimes span packages, even - a "string" HOW-TO will need to talk about Unicode and string.py and maybe buffer.py or whatever it is called, and so on). > By Guido-of-the-markup-languages did you mean "benevolent dictator" or > "good designer" or "long-term keeper of the faith" or something else? Oh, sorry, I meant the "good designer" sense (although the others are needed after that, but they weren't what I meant). > Although it may seem surprising, i don't immediately conclude that > there are so many tags that we can't possibly design a reasonably > useful markup syntax. Many of the tags are redundant or produce > shades of meaning finer than i consider really necessary. Hmm. Unfortunately, losing shades of meaning early on means you can never regain them, and one person's shade of meaning is another's hearfelt "but they're not the same". But this, I think, is the second place of disagreement (the "new markup language" disease). Strangely enough, I worry less about this than the other one - I trust Fred and co. to ensure that we can *produce* documentation for printing out that is worth using, and I've used too many different methods of markup to worry overmuch about what good or rubbish system I'm required to use - it's unlikely to be as bad as DSR (Digital Standard Runoff). Although if we *want* to do proper markup, I wish I could convince people that you actually *do need a proper markup language* (heck, we already all know that if you want to do proper programming you need a proper programming language, so why is this so hard to convey?) Interestingly, I've seen this game (or reducing tags because they're "not needed") played out in an entirely different arena - Great Britain's national mapping agency (OS(GB) - that is, Ordnance Survey(GB) - Northern Ireland has its own) reduced the number of feature codes they use to distinguish map objects drastically some years back, mainly to enable cost effective digitising of the non-digital map base (and they *did* have rather an over-presence of railway related codes - shows who *used* to be important in the country!). I still believe that's going to cause them grief (trans. "money"), and in the not so distant future either - but that's related to work... > I'm not claiming it's possible until i really give it a try, but > i do think it's worth a serious attempt. Please do bear in mind the "philosophy" of ST then, whilst trying. Hmm - best stop fiddling with this, and send it - my tummy is rumbling... [[[Last thought: I wonder if the ability of pydoc to reproduce (something that *looks* like) the library documentation for some (maybe even many) modules - case in point, string - is an influence on your stance? My second thought on that is, beware of thinking that the appearance is all. To those who love/want markup, the appearance may be important, but the *meaning* is what they want to be able to extract from the text. My first thought is that, actually, I *don't* particularly have an objection to a module's library documentation being generated directly from its docstrings if a. that's all there is, b. Fred et al say that is sufficient (which I suspect they'd rather not, if they want better markup, but I'll let them decide), and c. noone volunteers to write more (more is generally not a bad thing, mind, even for the string module). Hmm. Must stop thinking and eat.]]] Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 12 14:34:33 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 12 Mar 2001 14:34:33 -0000 Subject: [Doc-SIG] Docstring markup process In-Reply-To: Message-ID: <006501c0ab01$8df5d830$f05aa8c0@lslp7o.int.lsl.co.uk> > anyone who wants to propose a particular markup format > should write a PEP describing that format in detail. > Only formats described in a PEP will be under serious > consideration. By an agreed deadline, we can vote on > the PEPs, and then be done with it. > > Of course we can discuss the various proposals here, but it's a big > step forward to get them written up and all in one place for > comparison. I have printed out PEP 1, and notice that it seems to say that only someone with access to the python-dev list can propose a PEP (or rather, it heavily implies that generation of the PEP happen there). I *know* life can't be that simple, as I'm sure some of the existing PEPs have started on other lists. So what is the situation? Does one just put something together and send it to Barry Warsaw with some plausible excuse? Ka-Ping - are you proposing anyone in particular to start a PEP for usage of ST in docstrings, or shall I divert sideways and wrest something out of the STpy documentation? Since I'm sporadically trying to update the ZWiki entry for this stuff, and aiming to update the STpy document *anyway*, I guess this would be an opportunity to convert it into STpy (nowt like the documentation being self-referencing). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tavis@calrudd.com Mon Mar 12 18:03:24 2001 From: tavis@calrudd.com (Tavis Rudd) Date: Mon, 12 Mar 2001 10:03:24 -0800 Subject: [Doc-SIG] suggestions for a PEP Message-ID: <01031209305701.18635@lucy> --------------Boundary-00=_OHJ3M3E2ZQZ6GAK7DZGH Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit Good morning, I like Ka-Ping's suggestion of a formal PEP and have some made some notes summarizing the ideas I like from recent postings to doc-sig. They could be written up as a PEP, along with a formal spec for Structured Text. There's nothing truly original here, just a sythesis of other people's ideas: 1- module API documentation should be in the same file as source 2- a FORMALIZED version of structured text should be used for inline formatting. There's no need to repeat the justifications here. The final version of structured text should include a facility for storing meta-data in a field format that is easily identifiable to both the human eye and the parsing tool. (e.g. authors, version, keywords, spam) 3- no changes should be required to the python parser 4- the module's namespace should not be polluted and it's memory requirements should not be inflated by use of inline documentation 5- therefore, the existing __doc__ docstrings should be used for very short synopyses, and extended documentation that is discarded at the the byte-compile stage should be written in string literals that appear immediately after the existing docstrings. These extra string literals would be written in ST, while the __doc__strings would be in plain text. These two forms of API docs should complement and not duplicate each other. See the example module attached to this message. 6- the documentation parsing tools should be capable of producing output in many formats (manpages, plain text, html, latex, for a start), 7- the doc parsing tools should not need to import and run the module to produce it's documentation (for security reasons alone) 8- module Library Reference documentation should also be kept in the same file as the module source. It should compliment the API docs with examples, extended discussions of usage, tutorials, test code, etc., but should not duplicate the API reference material. 9- the Library Reference docs should be written in string literals, as with the extended API docs proposed in pt. 5, but there should be a prefix token such as """LIBREF: at the start of each chunk to signal to the doc tools that the following text is not part of the API ref. The token would allow this documentation to be split up into chunks that can appear anywhere in the source file (a la perl's POD). 10- the Library Reference documentation should also be written in ST as using LaTeX here would force the module author to learn yet another mark-up language, require the documentation user to install yet another processing tool (although this isn't an issue on Linux), and would place too much emphasis on the separation between the API and library reference docs and discourage synchronization as the module evolves! The same argument applies to maintaining the status quo of external doc files. Any extra meta-data that is needed for proper indexing, etc. (to meet Guido's concerns) should be included as fields in the string literals as is done in JavaDoc (but not neccessarily with that syntax). What do you think? Cheers, Tavis Rudd p.s. Other issues to consider: - caching of documentation so it doesn't have to be regenerated every time it's used - documenting Packages - inheriting documentation (Edward Loper's idea) - hiding API docs for __privateInternals (ditto) - documenting extensions in other languages - comments within the markup language --------------Boundary-00=_OHJ3M3E2ZQZ6GAK7DZGH Content-Type: text/plain; charset="iso-8859-1"; name="test.py" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="test.py" IwojIE11bHRpbGluZSBjb21tZW50IGF0IHN0YXJ0IG9mIG1vZHVsZSAKIyAtIGNvdWxkIGJlIGlu Y2x1ZGVkIGFzIHRoZSBtb2R1bGUgZGVzY3JpcHRpb24gaWYKIyAgIHRoZXJlIGlzIG5vIGRvY3N0 cmluZyAtIGUuZy4gYXMgcHlkb2MgY3VycmVudGx5IGRvZXMKIwoiIiJNb2R1bGUgRG9jc3RyaW5n IiIiCiIiIgpleHRyYURvY3N0cmluZyBmb3IgdGhlIG1vZHVsZQoKTXkgcHJvcG9zYWwgaXMgdG8g dXNlIHRoZSBleGlzdGluZyBfX2RvY19fIGxvY2F0aW9uIGZvciBhIG9uZS1saW5lIHN5bm9wc2lz LCBhbmQKd2hlcmUgbmVlZGVkIHBsYWNlIGEgc2Vjb25kIHN0cmluZyBsaXRlcmFsIGFmdGVyIHRo ZSBfX2RvY19fIGxvY2F0aW9uIGZvciBtb3JlCmRldGFpbGVkIEFQSSBkb2N1bWVudGF0aW9uIHRo YXQgaXMgZm9ybWF0dGVkIGluIFNUcHkuICBSYXRoZXIgdGhhbiBpbnRyb2R1Y2luZyBhIG5ldwpf X2Zkb2NfXyBhdHRyaWJ1dGUgaW50byBweXRob24sIGFzIEVkd2FyZCBMb3BlciBzdWdnZXN0ZWQs IHRoZXNlIHN0cmluZyBsaXRlcmFscwpzaG91bGQgYmUgZGlzY2FyZGVkIGF0IHRoZSBweXRob24g Ynl0ZS1jb21waWxlIHN0YWdlLiAgVGhlcmUncyBubyBuZWVkIGZvciB0aGlzCmRvY3VtZW50YXRp b24gYXQgcnVuLXRpbWUgYW5kIHRoZXJlJ3MgY2VydGFpbmx5IG5vIG5lZWQgdG8gcmVxdWlyZSBj aGFuZ2VzIHRvCnB5dGhvbidzIHBhcnNlci4KCkFzIHlvdSdsbCBzZWUgYmVsb3csIHRoaXMgd291 bGQgYWxzbyBhbGxvdyBmb3IgYXR0cmlidXRlIGRvY3N0cmluZ3MsIGEgbGEKTWFyYy1BbmRyZSdz IFBFUCAoSSd2ZSBmb3Jnb3R0ZW4gdGhlIG51bWJlcikuCgoiIiIKIyBkb24ndCB1c2UgX19hdXRo b3JfXywgX192ZXJzaW9uX18sIGV0Yy4gYXMgbW9kdWxlIGF0dHJpYnV0ZXMuCiMgVGhlcmUncyBu byBuZWVkIHRvIHBvbGx1dGUgdGhlIG5hbWVzcGFjZS4gUmF0aGVyIGluY2x1ZGUgdGhlbQojIGFz IGZpZWxkcyBpbiB0aGUgZXh0cmFEb2NzdHJpbmcKCgpNT0RVTEVfQ09OU1RBTlRfMSA9IDAKIiIi ZXh0cmFEb2NzdHJpbmcgZm9yIGEgbW9kdWxlIGNvbnN0YW50IiIiCgpNT0RVTEVfQ09OU1RBTlRf MiA9IDEKIiIiZXh0cmFEb2NzdHJpbmcgZm9yIGFub3RoZXIgbW9kdWxlIGNvbnN0YW50IiIiCgoK ZGVmIGdsb2JhbEZ1bmMoYXJnKToKICAgICIiIkZ1bmN0aW9uIERvY3N0cmluZyIiIgogICAgIiIi ZXh0cmFEb2NzdHJpbmcgZm9yIGdsb2JhbCBmdW5jdGlvbiIiIgoKICAgIGFWYXIgPSBhcmcqMiAK ICAgICIiImV4dHJhRG9jc3RyaW5nIGZvciBhbiBleHByZXNzaW9uIHdpdGhpbiBhIGdsb2JhbCBm dW5jdGlvbiAoaXMgdGhpcyByZWFsbHkgbmVlZGVkPz8pIiIiCiAgICBwcmludCBhVmFyIAogICAg CgpjbGFzcyBTYW1wbGVDbGFzczoKICAgICIiIkNsYXNzIERvY3N0cmluZyIiIgogICAgIiIiZXh0 cmFEb2NzdHJpbmcgZm9yIGEgY2xhc3MKCiAgICAtIHRlc3Qgc3RyaW5nIHdpdGggYSB3aG9sZSBi dW5jaCBvZiB0ZXh0LCB3aGljaCBjb3VsZCBiZSBmb3JtYXR0ZWQgd2l0aCBTVHB5LCBldGMuCiAg ICAKICAgIFNOSVAgLS0tIFNldmVyYWwgdG9vbHMgaGF2ZSBiZWVuIHdyaXR0ZW4gb3IgcHJvcG9z ZWQgZm9yIHByb2Nlc3NpbmcgUHl0aG9uIGRvY3VtZW50YXRpb24KICAgIHN0cmluZ3Mgd2l0aCBz cGVjaWZpYyBmb3JtYXR0aW5nIGNvbnZlbnRpb25zIFsxXSwgWzJdLCBbM10sIFs0XS4gRXZlbnR1 YWxseSwgaXQKICAgIHdvdWxkIGJlIG5pY2UgaWYgc3VjaCBhIHRvb2wgYmVjYW1lIHBhcnQgb2Yg dGhlIFB5dGhvbiBiYXNlIGxpYnJhcnkuIEhvd2V2ZXIsIGZvcgogICAgdGhpcyB0byBoYXBwZW4s IHRoZSBjb21tdW5pdHkgbmVlZHMgdG8gY29tZSB0byBhIGNvbnNlbnN1cyBvbiB3aGF0IGZvcm1h dHRpbmcKICAgIGNvbnZlbnRpb25zIGRvY3VtZW50YXRpb24gc3RyaW5ncyBzaG91bGQgdXNlLiBJ biB0aGlzIGRvY3VtZW50LCBJIGRpc2N1c3Mgc29tZSBvZgogICAgdGhlIGlzc3VlcyB0aGF0IHN1 Y2ggY29udmVudGlvbnMgbXVzdCBkZWFsIHdpdGguIEkgYXR0ZW1wdCB0byBwcmVzZW50IGEgZmFp cmx5CiAgICBjb21wcmVoZW5zaXZlIGxpc3Qgb2YgaXNzdWVzLCBidXQgSSB3aWxsIGJlIGdsYWQg dG8gYW1lbmQgdGhpcyBkb2N1bWVudCBhcyBtb3JlCiAgICBpc3N1ZXMgYXJlIHBvaW50ZWQgb3V0 IHRvIG1lLiAgQmVjYXVzZSBJIHdhcyBpbnRlcmVzdGVkIGluIGV4cGxvcmluZyBzb21lIG9mCiAg ICB0aGVzZSBpc3N1ZXMsIEkgd3JvdGUgbXkgb3duIGRvY3VtZW50YXRpb24gZXh0cmFjdGlvbiB0 b29sLCBFcHlkb2MgKGVkbG9wZXIncwogICAgcHlkb2MpLiBUaGlzIHRvb2wgd2FzIHByaW1hcmls eSBpbnRlbmRlZCBhcyBhIHByb3RvdHlwZSwgdG8gbGV0IG1lIHBsYXkgd2l0aAogICAgZGlmZmVy ZW50IHdheXMgb2YgcHJvY2Vzc2luZyBkb2N1bWVudGF0aW9uIHN0cmluZ3MuIEZvciBtb3JlIGlu Zm9ybWF0aW9uLCBvciB0bwogICAgc2VlIHdoYXQgdHlwZSBvZiBvdXRwdXQgRXB5ZG9jIHByb2R1 Y2VzLCBzZWUgdGhlIGRvY3VtZW50YXRpb24gZm9yIEVweWRvYwogICAgKGtlZXBpbmcgaW4gbWlu ZCB0aGF0IGl0J3MganVzdCBhIHByb3RvdHlwZSA7LSkgKS4gTWFueSBvZiB0aGUgaWRlYXMgY29u dGFpbmVkIGluCiAgICB0aGlzIGRvY3VtZW50IGFyZSBub3QgbWluZS4gSSBkb24ndCB0YWtlIGNy ZWRpdCBmb3IgYW55IG9mIHRoZW0uIEhvd2V2ZXIsIEknbSBub3QKICAgIG9yZ2FuaXplZCBlbm91 Z2ggdG8gZmlndXJlIG91dCB3aGVyZSBtb3N0IG9mIHRoZSBpZGVhcyBkaWQgY29tZSBmcm9tLgog ICAgIiIiCgogICAgCiAgICBhbkF0dHJpYnV0ZSA9IDEyMzQKICAgICIiImV4dHJhRG9jc3RyaW5n IGZvciBhIGNsYXNzIGRhdGEgYXR0cmlidXRlIiIiCiAgICBhbm90aGVyQXR0cmlidXRlID0gJ2Fi Y2RlZmcnCiAgICAiIiJleHRyYURvY3N0cmluZyBmb3IgYW5vdGhlciBjbGFzcyBkYXRhIGF0dHJp YnV0ZSIiIgogICAgCiAgICBkZWYgbWV0aG9kKHNlbGYsIGFyZyk6CiAgICAgICAgIiIiQSB0ZXN0 IGZ1bmMgZG9jc3RyaW5nICIiIgogICAgICAgICIiImV4dHJhRG9jc3RyaW5nIGZvciBhIGNsYXNz IG1lbWJlciBmdW5jdGlvbiIiIgogICAgICAgIHByaW50IGFyZwogICAgICAgIHJldHVybiAxCgoK CiMgYW5kIG9mIGNvdXJzZSB1c2UgY29tbWVudHMgdG8gZG9jdW1lbnQgYW55dGhpbmcgdGhhdCBv bmx5IG5lZWRzIHRvIGJlCiMgcG9pbnRlZCBvdXQgdG8gdGhvc2Ugd29ya2luZyB3aXRoIG1vZHVs ZSdzIHNvdXJjZSBkaXJlY3RseSBlLmcuCmdsb2JhbEZ1bmMoJ0JpZ2dsZXMgRGljdGF0ZXMgYSBM ZXR0ZXIuICcpICMgdGhpcyBjb3VsZCBkb2VzIGJ1Z2dlciBhbGwgLSBkb24ndCBib3RoZXIgbWFp bnRhaW5pbmcgaXQuCmVyaWMgPSBTYW1wbGVDbGFzcygpCmVyaWMubWV0aG9kKCdzcGFtJykKCgoK IyBBbmQgZmluYWxseSBhIHN0cmluZyBsaXRlcmFsIGZvciB0aGUgZm9ybWFsIGRvY3VtZW50YXRp b24gdG8KIyBpbmNsdWRlIGluIHRoZSBweXRob24gbGlicmFyeSByZWZlcmVuY2UgYXMgS2EtUGlu ZyBzdWdnZXN0ZWQgaW4gaGlzCiMgcG9zdCBvbiBNYXJjaCAxMS4gIFRoaXMgaXMgY29uY2VwdHVh bGx5IHNlcGFyYXRlIGZyb20gdGhlCiMgQVBJIHJlZmVyZW5jZSBkb2N1bWVudGF0aW9uIHN0b3Jl ZCBpbiB0aGUgZG9jc3RyaW5ncyBhbmQgZXh0cmFEb2NzdHJpbmdzIGFib3ZlLgojIEl0IHNob3Vs ZCBjb21wbGltZW50LCBub3QgZHVwbGljYXRlLCB0aGUgQVBJIHJlZmVyZW5jZSBhbmQgcHJvdmlk ZSBhbnkKIyBtZXRhLWRhdGEgbmVjZXNzYXJ5IGZvciBwcm9wZXIgaW5kZXhpbmcgaW4gdGhlIHB5 dGhvbiBsaWJyYXJ5IHJlZmVyZW5jZSBwYWdlcy4KCiIiIkxJQlJBUllfUkVGOgoKU3RhcnQgd2l0 aCBhIHRva2VuIHRvIHRlbGwgdGhlIGRvY3VtZW50YXRpb24gcGFyc2luZyB0b29sIHRoYXQgd2hh dCBmb2xsb3dzIGlzIGZvcgp0aGUgbGlicmFyeSByZWZlcmVuY2UgYW5kIG5vdCB0aGUgQVBJIHJl Zi4gIFRoaXMgd291bGQgcmVtb3ZlIHRoZSBuZWVkIGZvciB0aGlzCnN0cmluZyBsaXRlcmFsIHRv IGJlIGluY2x1ZGVkIGFzIHRoZSB2ZXJ5IGxhc3QgdGhpbmcgaW4gdGhlIHNvdXJjZSBmaWxlLCBh bmQgd291bGQKYWxzbyBhbGxvdyB0aGUgbGlicmFyeSByZWZlcmVuY2UgY29kZSB0byBiZSBzcGxp dCB1cCBpbnRvIGNodW5rcyB0aHJvdWdob3V0CnRoZSBzb3VyY2UgY29kZSAoYSBsYSBwZXJsJ3Mg UE9EKS4gIEkgdGhpbmsgdGhpcyBpcyB3aGF0IEthLVBpbmcgbWVhbnQgYnkKIihvciBtYXliZSBj b2xsZWN0IGRvY3N0cmluZ3MgZnJvbSBhbnl3aGVyZSBpbiB0aGUgZmlsZSkuIgoKVGhpcyBkb2N1 bWVudGF0aW9uIHN0cmluZyBjb3VsZCBiZSB3cml0dGVuIGluIExhVGVYLCBvciBzdHJ1Y3R1cmVk IHRleHQuICBNeSBwZXJzb25hbApwcmVmZXJlbmNlIGlzIGZvciBhIGZvcm1hbGl6ZWQgU3RydWN0 dXJlZCBUZXh0IGxhbmd1YWdlLiAgSSB1c2UgTGFUZVggZXh0ZW5zaXZlbHkKZm9yIGxhcmdlciBk b2N1bWVudHMgKGUuZy4gbXkgdGhlc2lzKSwgYnV0IHRoaW5rIGl0J3MgaW5hcHByb3ByaWF0ZSBm b3IgbW9kdWxlIGRvY3MKZm9yIHRoZXNlIHJlYXNvbnM6Ci0gaXQgZGVtYW5kcyB0aGUgcHJlc2Vu Y2Ugb2YgZXh0cmEgcHJvY2Vzc2luZyB0b29scwotIGl0IHJlcXVpcmVzIHRoZSBtb2R1bGUgY3Jl YXRvciB0byBsZWFybiB5ZXQgYW5vdGhlciBtYXJrdXAgbGFuZ3VhZ2UKLSBpdCBvZmZlcnMgbm8g cmVhbCBhZHZhbnRhZ2Ugb3ZlciBTVCBmb3Igc21hbGwgZG9jdW1lbnRzICg8MTAgcGFnZXMpCi0g aGF2aW5nIHR3byBtYXJrLXVwIGxhbmd1YWdlcyBpbiB1c2UgZm9yIG9uZSBtb2R1bGUgcGxhY2Vz IHRvbyBtdWNoIGVtcGhhc2lzCiAgb24gdGhlIHNlcGFyYXRpb24gYmV0d2VlbiB0aGUgQVBJIGFu ZCBsaWJyYXJ5IHJlZmVyZW5jZSBkb2NzIGFuZCBkaXNjb3VyYWdlcwogIHN5bmNocm9uaXphdGlv biBhcyB0aGUgbW9kdWxlIGV2b2x2ZXMhCgoiIiIKCgo= --------------Boundary-00=_OHJ3M3E2ZQZ6GAK7DZGH-- From edloper@gradient.cis.upenn.edu Mon Mar 12 19:36:23 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 12 Mar 2001 14:36:23 EST Subject: [Doc-SIG] Formalizing StructuredText (yeh!) In-Reply-To: Your message of "Fri, 09 Mar 2001 10:35:57 GMT." <004401c0a884$b9ea1890$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103121936.f2CJaNp18187@gradient.cis.upenn.edu> > BUT I think that it would be better to strategically insert references > to STminus at "the appropriate places" in the Zwiki (mind you, I've > still to figure out how the people using Wikis target new stuff that > they should be looking at). I added a page to Zope (under CurrentIssues), and I'll try to actually put more content there when I can. :) > > > I have a suscicion that STminus's current definition does *not* > > actually provide a subset of the intersection of STNG and STpy.. > > I think it's very close, actually. I guess it depends on how much the implementations diverge from the intensions.. At least for STNG, there are a number of current differences. I've been writing a large test set, and plan to post a link to it, and to the results of running STminus on it, later today.. (still needs a little more work). At that point, I'm hoping we can get a better idea of whether STNG and STpy really act like STminus. (My guess is that most differences are unintentional ones) > > (although, just because one parses something one way, > > doesn't mean that that's the intended behavior, but..) > > Well... > > In the case of STminus, I think we could say that it does (!) Definitely, since STminus is implemented directly from its formal definition. > In the case of STpy, if it does something surprising then either that's > a bug, or an unforeseen consequence of something that *isn't* a bug, in > which case it either needs designing around or explaining. > I doubt that STNG is too much different (although I suspect they prefer > the "explain around" to the "change" mechanism - stability (of some > sort) over perceived complexity). I'm hoping that STNG will be willing to make at least a few changes.. For example, changing 'x*y' and 'y*z' to be 2 literals rather than one emph area. > (right - so STpy/docutils is going to be the Common Lisp of ST, STminus > is clearly Scheme, so what is STNG? I feel it is likely to have more > "unexpected results" because of a wish for a faster engine - Tcl? no, > that's unfair) Well, I'm hoping that all 3 at least have well-defined results *where they do define the results*. I certainly hope that STNG will only give "unexpected results" for a small subclass of strings.. :) > One comment - when defining *StructuredText*, you say "all paragraphs > and list items be separated by at least two blank lines" - shouldn't > that be "one blank line" which is "two newlines". The text was wrong, the production was (I think) correct. It should have said that paragraphs are separated by "at least one blank line." I'll fix it.. > I tend to think of the STNG document structure as being: > > BlankLine = S* NL > TextLine = <> NL > Paragraph = TextLine+ > StructuredTextNG = BlankLine* Paragraph > (BlankLine+ Paragraph)* > BlankLine* Almost.. But since I define paragraphs *not* to include their trailing newline, you need at least two *newlines* between paragraphs. Also, the way you wrote it, there's an ambiguity as to whether trailing spaces belong in the paragraph and the blank line.. (I wrote my implementation of ebnfla so it checks for all possible ambiguities, so this type of thing is easier to detect when you're actually playing with rules). Finally, your rule doesn't allow for strings that contain a final blank line that isn't terminated by a newline. I'll try to give a better explanation of the rule in http://www.cis.upenn.edu/~edloper/pydoc/stminus-001.html when I get a chance. > This assumes that the empty document (only consisting of zero or more > blank lines) isn't allowed, which makes the production simpler - I > suppose that first "Blankline* Paragraph" could become a "?" group (0 or > 1 occurrences) if needs be... I assumed that the empty document was acceptable.. Unless there's a reason to make in unacceptable. > It's that first paragraph that's the problem - if it had to have a > starting blank line life would be a lot simpler (indeed, I think STNG > solves this problem by pretending it does!). If we had a starting break, we would still need:: StructuredText = (S* NL)* (NL (S* NL)+ Paragraph | NL (S* NL)+ ListItem)* (NL S*)* instead of:: StructuredText = (S* NL)* (Paragraph | ListItem)? (NL (S* NL)+ Paragraph | NL (S* NL)+ ListItem)* (NL S*)* > Surely for the common-denominator, you don't need to separate out list > items from paragraphs in this production? I separated out list items from paragraphs to make it easier to replace the rule for STpy. We could very easily say instead:: StructuredText = (S* NL)* Entity? (NL (S* NL)+Entity)* (NL S*)* Entity = Paragraph | ListItem If you think that's easier to read. It defines the same language, so it doesn't really matter to me. :) > Or, if you do: > > Paragraph = ( ListItem | TextLine ) > TextLine* I tried to make all of my productions correspond to their actual entities.. So you shouldn't need to do (much) postprocessing on the output of STminus. For example, the Paragraph production should give an entire paragraph, not just its first line. I think I may add ULItem, OLItem, and DLItem productions for similar reasons (without changing the language defined by the productions) > Hmm. Anyway, I'm still very impressed - keep up the good work! Thanks! You've done some impressive work, yourself. From edloper@gradient.cis.upenn.edu Mon Mar 12 19:50:17 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 12 Mar 2001 14:50:17 EST Subject: [Doc-SIG] Docststring markup process Message-ID: <200103121950.f2CJoHp19946@gradient.cis.upenn.edu> > Paul Prescod made an excellent meta-proposal for docstrings at the > recent conference: rather than arguing endlessly about various > markup formats, anyone who wants to propose a particular markup format > should write a PEP describing that format in detail. Only formats > described in a PEP will be under serious consideration. By an agreed > deadline, we can vote on the PEPs, and then be done with it. As far as I can tell, the current problems lie less in not being able to agree on a markup format, and more in not being able to define one. STminus is working to fix that, but it will take several iterations before we have something that would be worthy of putting in a PEP. I myself am remaining as neutral as I can as to what the actual format should be. I figure that there are enough other people out there to make sure that it's simple, easy, etc.. But my two main concerns are that the markup format be: 1. Well defined -- i.e., there should be a "correct" parse for most strings. The remaining strings should have *explicitly* "undefined" results. 2. Safe -- there should be *no* "unexpected results," except in where the results are "undefined." This means, for example, that the results of using a single '*' in a string should be undefined, because if it's defined to produce an asterisk, then people will assume they can write 'x*y' that way, and later in the same document write 'y*z' and get very counter- intuitive results. > Of course we can discuss the various proposals here, but it's a big > step forward to get them written up and all in one place for comparison. > > I strongly support this process; let's pick a deadline. I would appreciate it if we can make the deadline far away enough that we can have a real formal definition for whatever it is we're proposing. -Edward From edloper@gradient.cis.upenn.edu Mon Mar 12 19:53:42 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 12 Mar 2001 14:53:42 EST Subject: [Doc-SIG] reserved characters Message-ID: <200103121953.f2CJrgp20338@gradient.cis.upenn.edu> I'm thinking of adding a number of "reserved characters" to STminus, that specific versions of ST may decide to use or not use. They would redefine their own ReservedCharacter production accordingly.. So: 1. Is this a Good Idea? 2. What characters should be included? Clearly at least '#' for STpy. Other possibilities are backquote, at sign, exclamation mark, etc. I'd like to reserve as few as we can get away with, since the more characters are reserved, the less useful a program written to work on *any* StructuredText document can be. -Edward From ping@lfw.org Mon Mar 12 17:50:37 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 12 Mar 2001 09:50:37 -0800 (PST) Subject: [Doc-SIG] Docstring markup process In-Reply-To: <006501c0ab01$8df5d830$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: On Mon, 12 Mar 2001, Tony J Ibbs (Tibs) wrote: > I *know* life can't be that simple, as I'm sure some of the existing > PEPs have started on other lists. So what is the situation? Does one > just put something together and send it to Barry Warsaw with some > plausible excuse? Pretty much, yes. If you're serious enough about a proposal to write a PEP for it, and it's clear that the goals are at least reasonable, it should be easy to ask Barry to assign you a PEP number. Or if the process has to be formalized, i'm sure one of us will be happy to vouch for any docstring proposal that looks reasonable and to submit it on the author's behalf. > Ka-Ping - are you proposing anyone in particular to start a PEP for > usage of ST in docstrings, No. Indeed, i think the whole point of Paul's suggestion was to remove the expectation that we would somehow all *first* agree on something and then delegate someone to do the work of describing it in a PEP; rather, the onus is now on whoever wants to champion a proposal. > Suggestion for meat-and-bones of PEP: > > 1. STNG plus '>>>' plus '#...#', maybe plus "tagged paragraphs" [...] > 2. Future enhancements to include: [...] > 3. Whether ST gets vastly expanded [...] So don't think of it as "the PEP". Write *your* PEP and do the best you can to cover the bases. You can address things like "the library reference docs should be moved into the module docstrings for the following reasons, and this is how we would do it" or "the library reference docs should not be moved for the following reasons..." etc. So yes, please write! -- ?!ng "Computers are useless. They can only give you answers." -- Pablo Picasso From edloper@gradient.cis.upenn.edu Mon Mar 12 20:38:03 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 12 Mar 2001 15:38:03 EST Subject: [Doc-SIG] Evolution of library documentation In-Reply-To: Your message of "Mon, 12 Mar 2001 10:44:14 GMT." <005e01c0aae1$61c8d890$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103122038.f2CKc3p26323@gradient.cis.upenn.edu> I think that for the most part, I tend to side with Tibbs on this.. >From my perpsective, docstrings should include clear, concise, and unabiguous *definitions* of python entities. In other words, by reading a docstring, you should know *exactly* what a function/ method/class/etc. is guaranteed to do. Note that they can leave that underspecified. For example, a function that is defined to "return a list containing the prime numbers between 1 and n, exclusive" is free to order that list however it wants. This helps enormously when trying to read source code, because it lets you know what parts of a function are there because the're part of the function's definition, and what parts are just an implementation decision. Note that that is generally exactly the type of documentation you want in a reference manual. I think that the idea of building the reference manuals from docstrings makes a lot of sense. I do *not* think that including tutorials, howtos, big examples with explanations, etc. belong in the docstrings. > (although I would point out it's > *significantly* easier to "have TeX" than, for instance, to "have CVS", > which one is also required to have to do development with modern Python > (a *serious* problem for some of us). This is pretty much completely a digression, but I figured I'd chime in. Regardless of how easy it is to download CVS or LaTeX, it is much easier to learn how to *use* CVS than to learn how to use LaTeX. Given a knowledgable teacher, you can learn everything you need to know about CVS in an hour. I'm not sure anyone ever learns everything they need to know about LaTeX. :) (This is coming from many years of experience using both pieces of software -- and I personally think that they're *both* great, and everyone should learn them both. :) ) > > This would address the duplication problem and > > also keep all of a module's documentation in one place together with > > the module. > > Now, if you said "package" I'd be happy, but since it's "module", I'll > gripe. The module's definition should be kept with the module (or enough background explanation to allow one to define every class/function/ method/var in the module). It *may* make sense to keep other types of documentation in the module (??) or the package, but it's less clear. But if they are given a place, I don't think it should be in docstrings. > > To avoid forcing you to page through a huge docstring > > before getting to the source code, we would allow a long docstring to > > go at the end of the file (or maybe collect docstrings from anywhere > > in the file). > > Aagh! No, sorry, my problem wouldn't be with paging (although that *is* > a problem - and why is the end of the file so different than the > front? - I page from both ends, depending on context!). In general, well-written definitions should be fairly short, so this shouldn't be TOO much of a problem. But one issue that I do remember people having is that docstrings are kept at runtime (which I think is great for what I'm saying they should do), and people are concerned that they will eat up too many resources.. Is this really a problem or am I just misremembering something? > Source files are for source code. I want to be able to *treat* them as > such. It is quite possible for a two page source to have ten or more > pages of documentation associated with it. That does *not* belong in the > same *file* as the source - if someone *wants* to associate them > closely, the correct way to do it is with a *package*. I think that the definitions, however big, should be kept with the source code. (Of course, if someone needs 10 pages to define the behavior of 2 pages of source, something's wrong). But I agree that everything else should either be kept in package (where appropriate) or at a higher level (for docs that span packages). > Also, *because* one might have more than one sort of "grander scope" > documentation for a module/package, you will have to consider > *supporting* more than one. Difficult if it is "just" a string tacked on > the end. I think that we should leave the organization of "grander scope" documentation to a different project.. (Of course, it's still an important project.) > The reason for adopting ST (or some variant) for markup in docstrings > is, basically, because it is acknowledged that many people will not > create docstrings with more markup than that, or with more obtrusive > markup than that. I think it might make sense to reserve one character, or maybe 2, for advanced markup. (We would also want to be able to backquote it somehow, but we'll leave discussion that for later..). So, for example, we could say that '@' is reserved for advanced markup, and then it can be used by people who: 1. want more advanced features 2. are willing to use a "real" markup, which is more "obtrusive" and difficult to read/write. I think that we should limit how much more complex "basic" ST gets, as much as possible.. > Worse, if one tries to continue using "simple" markup in ST, one is > going to end up with strained analogies, and with almost any > non-alphanumeric character having a special meaning. Yuck (can we say > Perl?). I will be very upset if ST takes that type of turn. > The obvious way round that is to start doing, well, markup - for > instance, '@class(..)' > or somesuch (like Pod, I think? - or GNU texinfo). In which case we're > inventing our own little markup language again, with none of the reasons > for doing it that went into ST. I agree that it doesn't make sense to make up our own markup langauge.. So maybe reserve (non-backslashed) '<' and '>' and use XML for advanced markup? Of course, depending on what we decide we need, we may be able to get away with much less. For example, define things like:: Parameters: p1 -- foo p2 -- bar To have special meaning in the context of docstrings. Any standard (STpy-like) ST will just render them as a heading with a list. You can also define special forms like:: author -- Edward Loper version -- 2.71828 That takes care of most of the structural-type markup, and still looks ok if you just read it, or parse it with non-docstring-specific ST tools.. Then you just have to worry about syntax for inline markup. I don't have any great ideas there, other than having a whole class of "advanced inline markup" tags like @somethingorother(...) Anyway, I have a meeting to get to.. :) -Edward From barry@digicool.com Mon Mar 12 21:37:01 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 12 Mar 2001 16:37:01 -0500 Subject: [Doc-SIG] Docstring markup process References: <006501c0ab01$8df5d830$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <15021.16765.317180.679742@anthem.wooz.org> >>>>> "TJI" == Tony J Ibbs writes: TJI> I have printed out PEP 1, and notice that it seems to say TJI> that only someone with access to the python-dev list can TJI> propose a PEP (or rather, it heavily implies that generation TJI> of the PEP happen there). No. PEPs can be written and proposed by anybody. Discussion on python-dev and/or python-list is encouraged to see if the idea is actually PEP-able. Remember that anybody can post to python-dev. If you're not a member, you get the added "convenience" of being able to read the list only via the web archives. python-list should be sufficient for all the initial discussion on a PEP though. I'd wager that successful shepherding of a PEP through to acceptance is probably a good way to earn the brownie points needed to be welcomed onto python-dev (and hence, although not strictly tied, to the PSF). TJI> I *know* life can't be that simple, as I'm sure some of the TJI> existing PEPs have started on other lists. So what is the TJI> situation? Does one just put something together and send it TJI> to Barry Warsaw with some plausible excuse? Actually, you should write a draft (using the style described in PEP 1 and looking at existing examples), and post it to python-list and/or python-dev. If there's general consensus that the idea is appropriate for PEP status, send me the latest version of the draft and I'll assign it a number. I'll make the necessary changes to PEP 0 and, if you don't have checkin permissions, I'll check in your initial draft. -Barry From klm@digicool.com Mon Mar 12 21:58:06 2001 From: klm@digicool.com (Ken Manheimer) Date: Mon, 12 Mar 2001 16:58:06 -0500 (EST) Subject: [Doc-SIG] Formalizing StructuredText (yeh!) In-Reply-To: Message-ID: Edward Loper: > Tony Ibbs?: > > "Edward D. Loper" > > BUT I think that it would be better to strategically insert references > > to STminus at "the appropriate places" in the Zwiki (mind you, I've > > still to figure out how the people using Wikis target new stuff that > > they should be looking at). Typically, people check the wiki's RecentChanges page, looking for page names of interest near the top - "near" getting bigger the less recent your last visit. (Automated notifications would be nicer, but our energy for this kind of thing is going elsewhere...) > I added a page to Zope (under CurrentIssues), and I'll try to actually > put more content there when I can. :) Very nice to see - thanks! > > > I have a suscicion that STminus's current definition does *not* > > > actually provide a subset of the intersection of STNG and STpy.. > > > > I think it's very close, actually. > > I guess it depends on how much the implementations diverge from > the intensions.. At least for STNG, there are a number of current > differences. I've been writing a large test set, and plan to > post a link to it, and to the results of running STminus on it, > later today.. (still needs a little more work). At that point, I'm Great boon that you're making tests - sounds like you're encountering bugs or unwanted looseness in STNG. We'll want to fix such. > hoping we can get a better idea of whether STNG and STpy really > act like STminus. (My guess is that most differences are unintentional > ones) > [...] > > In the case of STpy, if it does something surprising then either that's > > a bug, or an unforeseen consequence of something that *isn't* a bug, in > > which case it either needs designing around or explaining. > > I doubt that STNG is too much different (although I suspect they prefer > > the "explain around" to the "change" mechanism - stability (of some > > sort) over perceived complexity). > > I'm hoping that STNG will be willing to make at least a few changes.. > For example, changing 'x*y' and 'y*z' to be 2 literals rather than > one emph area. I'm pretty sure we will want to fix bugs and track a good base standard (which is where STminus seems to be heading) in STNG. The problem is going to be finding time to do so - at least for the next few weeks, all the likely suspects are inundated - but we are interested, and will eventually make time to track. (In case it needs saying, patches would be welcome...-) Ken Manheimer klm@digicool.com From edloper@gradient.cis.upenn.edu Tue Mar 13 01:49:08 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 12 Mar 2001 20:49:08 EST Subject: [Doc-SIG] suggestions for a PEP Message-ID: <200103130149.f2D1n8p01283@gradient.cis.upenn.edu> I'd be happy to write up a PEP, but I'm curious what things people think it should include. In particular, which of the following should it include: 1. The definition of a particular markup language? 2. Specification of the *semantics* of a markup langauge (as opposed to just its syntax) 3. Specification of what is "appropriate" to put in a docstring? 4. Specification of the "intended semantic content" of various "slots" like docstrings and strings immediately following docstrings? 5. Specifications of tools for docstrings? Possibly of tools that would eventually be included in the standard library? 6. whatever else you can think of.. Some of these questions require further work to answer (such as 1, where no one can currently give a good definition of any version of StructuredText).. Others will probably involve some agreement by the community.. I think it's important to distinguish the tools that process docstrings from any attempt to define what should go *in* docstrings (or similar places), whether that means what type of information or what type of markup. The PEP should address what goes *in* docstrings, but shouldn't necessarily have much to say about the tools that process the docstrings. (Leave those for a standard library extension (do those get PEPs?). Anyway, I wanted to comment on some of what Travis Rudd wrote: >> 1- module API documentation should be in the same file as source I will assume that "API" documentation is what I was advocating earlier. I.e., a clear, concise, unambigous *definition* of a Python object, such that you can tell exactly what is guaranteed by the object just by reading the definition. If that's *not* what you mean, please say what you do mean. >> 2- a FORMALIZED version of structured text should be used for inline >> formatting. There's no need to repeat the justifications here. Yay! :) I would add "safe" as well, but I'll argue that another day.. >> The final version of structured text should include a facility for >> storing meta-data in a field format that is easily identifiable to both >> the human eye and the parsing tool. >> (e.g. authors, version, keywords, spam) My favorite suggestion is to just use top-level description list items and top-level headings with a single level of description list items to store meta-data.. and to have "reserved" keys for description lists (in these contexts). E.g.:: def primes(n): """ Return a list of all the prime numbers from 1 to n, exclusive. Parameters: n -- The upper limit for the prime numbers to return. A prime number will be returned if and only if it is less than 'n'. Types: n -- int ReturnType -- 'list' of 'int' Version -- 1.0 Author -- Edward Loper See -- #other_primes# """ ... Also, I think that *all* of these special marked meta-data fields must be optional. (Of course, if the user wants to use a program that checks to make sure that ReturnType is defined for every method, they can.. but it's not required in general). >> 3- no changes should be required to the python parser >> >> 4- the module's namespace should not be polluted and it's memory >> requirements should not be inflated by use of inline documentation >> >> 5- therefore, the existing __doc__ docstrings should be used for very short >> synopyses, and extended documentation that is discarded at the >> the byte-compile stage should be written in string literals that appear >> immediately after the existing docstrings. These extra string literals >> would be written in ST, while the __doc__strings would be in plain text. >> These two forms of API docs should complement and not duplicate each >> other. I think that there definitely *are* instances where you want to get at these strings from within python, esp. if you're using the interpreter. One thing I really loved about python when I was learning it was that I could get decent help on just about anything very easily. I therefore propose the following: 1. STdoc strings appear after __doc__ strings, as I said before. 2. For now, these strings are thrown away by the compiler 3. At some future date, the compiler could be modified so that, at user option, it would produce ".pyd" files as well as ".pyc" files. These contain all the STdoc strings from the file, and can be accessed via the interpreter somehow. Python would *not* format them, it would just copy them. Maybe create a dictionary from identifier name to string, and pickle it. >> 6- the documentation parsing tools should be capable of producing output in >> many formats (manpages, plain text, html, latex, for a start), definitely, although this may take some time to implement. I would add "dynamic navigation of docs from within the python interpreter" to the list of possible "outputs." >> 7- the doc parsing tools should not need to import and run the module to >> produce it's documentation (for security reasons alone) Leaves open the question of what to do with C extensions, etc... >> 8- module Library Reference documentation should also be kept in the same >> file as the module source. It should compliment the API docs with >> examples, extended discussions of usage, tutorials, test code, etc., >> but should not duplicate the API reference material. Tutorials, examples, etc. do *not* belong in the same source file. That makes it much to hard to work with the source file. How much performance hit would we take if we turned *every* "standard" module into a package? That way we could have string.__tutorial__ etc or whatever, if we wanted to.. Among other (very good) reasons, this would mean that only the writer(s) of a module can write tutorials, discussions of usage, etc. for it.. >> 9- the Library Reference docs should be written in string literals, as with >> the extended API docs proposed in pt. 5, but there should be a prefix >> token such as """LIBREF: at the start of each chunk to signal to the >> doc tools that the following text is not part of the API ref. The token >> would allow this documentation to be split up into chunks that can appear >> anywhere in the source file (a la perl's POD). Possibly in a different file.. I find Tibb's arguments pretty convincing.. >> 10- the Library Reference documentation should also be written in ST as >> using LaTeX here would force the module author to learn yet another >> mark-up language, require the documentation user to install yet another >> processing tool (although this isn't an issue on Linux), and would place >> too much emphasis on the separation between the API and library >> reference docs and discourage synchronization as the module evolves! >> The same argument applies to maintaining the status quo of external doc >> files. Sounds reasonable to me.. Anyone want to try taking a few modules and converting their docs to ST, just to see what issues come up? >> Any extra meta-data that is needed for proper indexing, etc. >> (to meet Guido's concerns) should be included as fields in the string >> literals as is done in JavaDoc (but not neccessarily with that syntax). I think all this information should appear in description list items with reserved keys, or under reserved heading keys, as I mentioned above.. >> - caching of documentation so it doesn't have to be regenerated >> every time it's used Seems like an implementation detail of tools, not part of a PEP describing what should go into docstrings. >> - documenting Packages There definitely need to be provisions for that. >> - inheriting documentation (Edward Loper's idea) Well, really javadoc's or someone before them.. >> - hiding API docs for __privateInternals (ditto) This seems like an implementation detail for tools.. (But I agree it *should* be implemented on most tools :) ) >> - documenting extensions in other languages Much easier if we can import modules. But I guess safety's important. Oh well. So I'll try to write up a PEP when I get a chance. It sounds like Tibbs might write a proposal too. I think that Tibbs and I seem to have similar views on a lot of issues, so if we want diversity in our PEPs, maybe someone else should work on one too. :) (Of course, this sort of seems like redundant work, but I guess it's for the best or something) -Edward From tavis@calrudd.com Tue Mar 13 05:14:17 2001 From: tavis@calrudd.com (Tavis Rudd) Date: Mon, 12 Mar 2001 21:14:17 -0800 Subject: [Doc-SIG] two simple, but important questions In-Reply-To: <200103130149.f2D1n8p01283@gradient.cis.upenn.edu> References: <200103130149.f2D1n8p01283@gradient.cis.upenn.edu> Message-ID: <01031220543800.20079@lucy> I'm new to this SIG. After reading the archives I can't find any consensus about two critical issues that I feel should be at the forefront of any documentation PEP: issues that need to be addressed before the syntax of the markup language. It seems like excellent progress is being made on the Structured Text and tools fronts, but what do members of Doc-SIG feel about these issues? ONE: ----- Should the API documentation be written solely in docstrings that are accessible at runtime through __doc__? (option a) Note to Edward: by API documentation I mean exactly what you said. Or should more detailed documentation (i.e. the stuff in structured text) be written in a form that is not compiled into the byte-code, thus sparing the memory and processing overhead and keeping the namespace clean? (option b) One way of doing this is to place as second """string literal""" after the __doc__ docstring. Or, should it be kept separate from the __doc__ docstring but still be imported into the runtime environment (under a name like __fdoc__), at the cost of having to change the way python works internally? (option c) TWO: ---- Should the documentation tools (a) import and run module code in order to produce the documentation or (b) should they follow the safer route of parsing the documentation out of the raw source code (.py)? I suppose if your perspective on issue one is to go for option b, then you'd have to get the extended docs from the raw source code. MY PERSPECTIVE: -------------- ONE: I'd argue for option b, because: - it requires absolutely no changes to python's internals and is backwards compatible --->> therefore it can be implemented today not when python 2.? is released as in option c. - it keeps the namespace clean - it maintains all the advantages of having a short synopsis of a function, class or object available through __doc__ at runtime, but also allows for more extensive documentation to be provided, and marked up with Structured Text - *** all of the documentation can still be accessed at the interactive prompt (all you need is a help function as a wrapper for the doctool, and maybe a pager like in pydoc) - it keeps module loading time minimal, once the .pyc file has been generated initially and it keeps memory use low TWO: For security and performance issues alone I argue for option b. Inheritance, etc. can be figured out parse-the-source-code doctools just as easily as with import-the-module doctools. Edward, I don't see how import-the-module doctools would have any advantage for documenting extensions to python in other languages. The standard __doc__ strings would still be available, and parse-the-source-code doctools would be able to handle c, c++, java source without too much effort, plus numerous excellent tools already exist for this.. Cheers, Tavis From tony@lsl.co.uk Tue Mar 13 10:43:09 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 13 Mar 2001 10:43:09 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <01031209305701.18635@lucy> Message-ID: <006c01c0abaa$64f790f0$f05aa8c0@lslp7o.int.lsl.co.uk> Tavis Rudd wrote: > 2- a FORMALIZED version of structured text should be used for inline > formatting. There's no need to repeat the justifications here. > The final version of structured text should include a > facility for storing meta-data in a field format that is easily > identifiable to both the human eye and the parsing tool. > (e.g. authors, version, keywords, spam) OK. Historically (yes, Doc-SIG has been around long enough to have history), this is where efforts start to founder. The cycle goes: * long quiet period * flurry of agreement that we want an ST variant and some near agreement on what we want * someone says they're starting an implementation * people (ahem - in the past including me) start to discuss the *formalisms* that need to be enforced on people to allow information to be automatically extracted from docstrings * flurry of argument * list goes quiet for long period Do you see the problem? What I am working towards (and I shall have to add this to the PEP I've just started work on), we need to phase this carefully: 1. Decide on STpy (or an ST variant), with minimal extensions 2. Produce an application that parses it 3. Get it accepted, and get people using it 4. THEN, once we've changed the culture, spring our wonderful scheme for semi-formal markup on them, that allows them to extract special information. Two quick comments before you jump up and down at that: a. I don't actually believe that you are going to get most Python programmers to DO semi-formal markup for you. I *do* believe there's a good chance we'll get many people to write at least a little human readable text. So guess what I'm after... b. If we provide ST to markup said text, and it is easy to use, then most people will use it. And then, heh presto, we'll magically get *some* added value towards data extraction (at least, for instance, the sort of function signatures that IDLE and Pythonwin will present in a tool-tip). c. (ok - three things) you're more likely to win the "and please add more markup" battle *after* people have gotten used to the markup. d. (oh, I give up) see also http://www.tibsnjoan.co.uk/STpy.html#taggedparas which describes things like:: Author: Guido van Rossum and:: Arguments: fred -- this is a useless name for an argument Although it doesn't go into detail, the idea *is* that there should be some "requirement of structure" for such tags - that is, that arguments should be followed by a descriptive list, and so on. This is unlikely to be enforced early on, and may only be so via the DTD for the final DOM tree, but the idea *is* there. And although it says there that it should be left for second phase implementation, the start of support is already in docutils. > 3- no changes should be required to the python parser of course not. > 4- the module's namespace should not be polluted and it's memory > requirements should not be inflated by use of inline > documentation erm - sorry? if you want your inline docs to be available to a browser (and I *do*) then they've pretty well got to be around somewhere! > 5- therefore, the existing __doc__ docstrings should be used > for very short > synopyses, and extended documentation that is discarded at the > the byte-compile stage should be written in string > literals that appear > immediately after the existing docstrings. These extra > string literals > would be written in ST, while the __doc__strings would > be in plain text. > These two forms of API docs should complement and not > duplicate each > other. Sorry - I can't be bothered to reformat that - blame Outlook if you like. I disagree strongly. Keep it simple. A docstring is a docstring is a docstring. And in it goes the documentation for the entity. Using extra, magical, string literals is uncool. And anyway, I've been writing docstring in something close to ST (without thinking about it) for donkeys years - we WANT markup in docstrings! That's why we're doing this - not to impose on other people, but for ourselves! > See the example module attached to this message. > > 6- the documentation parsing tools should be capable of > producing output in > many formats (manpages, plain text, html, latex, for a start), See HappyDoc - that's not a problem. But the *parsing* tools don't produce output - the output tools do (docutils is a parsing tool - it parses docstrings. It happens to have a not-very-sophisticated docstring-finder built in (but both pydoc and HappyDoc do better jobs, in different ways, 'cos that's what they're for), and it happens to have a not-very-clever HTML outputter (but see the previous comment), but those are just for testing and proof-of-concept and examplitude...) > 7- the doc parsing tools should not need to import and run > the module to > produce it's documentation (for security reasons alone) Debatable, and depends on what you're after. pydoc does (it's part of its requirements if it's to be used as a "help" facility) and HappyDoc doesn't. That's to do with the *other* tools, not to do with the docstring tool. Think modules (or packages if you prefer) - a package to understand docstring contents, and render them in a form other modules can use, but that package need not know how to find them or what to do with them after unpicking them. > 8- module Library Reference documentation should also be kept > in the same > file as the module source. It should compliment the API > docs with > examples, extended discussions of usage, tutorials, test > code, etc., > but should not duplicate the API reference material. I've had that argument. > 9- the Library Reference docs should be written in string > literals, as with > the extended API docs proposed in pt. 5, but there > should be a prefix > token such as """LIBREF: at the start of each chunk to > signal to the > doc tools that the following text is not part of the API > ref. The token > would allow this documentation to be split up into > chunks that can appear > anywhere in the source file (a la perl's POD). Erm - yuck. BTW, if we *want* POD, and that's how POD does it, then we should *adopt* POD (in which case, after adoption, it is no longer yuck). But I'm agin it, for reasons argued in the past. > 10- the Library Reference documentation should also be > written in ST as > using LaTeX here would force the module author to learn > yet another > mark-up language, require the documentation user to > install yet another > processing tool (although this isn't an issue on > Linux), and would place > too much emphasis on the separation between the API and > library > reference docs and discourage synchronization as the > module evolves! > The same argument applies to maintaining the status quo > of external doc > files. Markup languages (actually, LaTeX is a TeX macro language, so is *technically* still a typesetting language, with added markup potential - this is important to understanding it properly) are just languages. It does the soul good to learn another one, just as it does with programming languages. I feel I need an Alex Martelli argument here, so please imagine one for me. > Any extra meta-data that is needed for proper indexing, etc. > (to meet Guido's concerns) should be included as fields > in the string > literals as is done in JavaDoc (but not neccessarily > with that syntax). The question of referring to Python data (e.g., #module.fred#) has been, shall we say, heavily deferred for the moment, 'cos it's contentious. Ka-Ping has *very* clever approaches to semi-automating it, and I have a sneaky feeling that if he keys it off '#..#' strings, many of my own past objections would go away (erm, have you bothered to read this far, Ka-Ping?). But there are also proposals for extra markup (such as '^#..#'). It's another of those things best left to second-phase, in my opinion. > What do you think? I think that your comments make sense in context, but (not your fault) are often going over ground that has been trodden before. I *heavily* feel that we need to get a *useful* (but not too ambitious) candidate implementation out the door before we get bogged down again, and if people hadn't started up all this discussion this week I'd be well on the way to it by now (although, actually, writing a PEP *is* a very good idea, and should have been done earlier). > p.s. Other issues to consider: > - caching of documentation so it doesn't have to be regenerated > every time it's used Tool issue. > - documenting Packages Goes in __init__.py - this sort of stuff just falls out naturally, I'm afraid, which is why we know Guido is a DGLD (damn good language designer) > - inheriting documentation (Edward Loper's idea) dosctrings inherit like any other value, surely? > - hiding API docs for __privateInternals (ditto) Tool issue - given a DOM tree one can prune it, and anyway adding a qualifier to docutils to say "don't collect *these* sorts of thing" is trivial (but there are *so many* trivial things, and they are so *obviously* trivial, that one has to leave some of them until later on) > - documenting extensions in other languages A big issue - leave it alone for now! > - comments within the markup language Why? I used to argue for these, Eddy convinced me not to. You'll have to come up with a convincing argument of *exactly* why you need these... Tibs - I'm sorry if any of that appears brusque, but I've got urgent paid work to do as well, so have to type fast... -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Tue Mar 13 10:49:31 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 13 Mar 2001 10:49:31 -0000 Subject: [Doc-SIG] Formalizing StructuredText (yeh!) In-Reply-To: <200103121936.f2CJaNp18187@gradient.cis.upenn.edu> Message-ID: <006d01c0abab$48c21490$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > I added a page to Zope (under CurrentIssues), and I'll try to actually > put more content there when I can. :) Thanks > I guess it depends on how much the implementations diverge from > the intensions.. My aim is to make the documentation and implementation fit. This isn't as dishonest as it sounds, because I have found that only by *doing* the implementation does one discover all of the implications of the form of ST one has chosen, and it's only fair to then document them... > I've been writing a large test set, and plan to > post a link to it, and to the results of running STminus on it, > later today.. (still needs a little more work). Great. > At that point, I'm > hoping we can get a better idea of whether STNG and STpy really > act like STminus. (My guess is that most differences are > unintentional ones) I agree. As you may have gathered, there *should* be a docutils 0.0.5, or maybe even 0.1alpha, soon, but unfortunately not this week, because of PEP writing, etc. So it may be worth waiting on that before testing docutils (but then again, maybe not). > Definitely, since STminus is implemented directly from its formal > definition. I like that in a tool. > Well, I'm hoping that all 3 at least have well-defined results *where > they do define the results*. I certainly hope that STNG will only > give "unexpected results" for a small subclass of strings.. :) There's unexpected (oh - I didn't want it to do that) and unexpected (ah - I see that that *does* follow from what I asked it to do - interesting). <<>> No, I defer to your expertise - I need to read what you wrote in more detail, but the case is compelling, so I leave it to you. > I tried to make all of my productions correspond to their actual > entities.. So you shouldn't need to do (much) postprocessing on > the output of STminus. For example, the Paragraph production > should give an entire paragraph, not just its first line. I think > I may add ULItem, OLItem, and DLItem productions for similar > reasons (without changing the language defined by the productions) OK - I see. That *is* a good reason. > Thanks! You've done some impressive work, yourself. Hmm - no, what I'm doing is much simpler, in basis - if I had real time for it it wouldn't take more than a couple of weeks, and there's nothing complicated in there. Anyway, so long as we're both having (some sort of) fun! Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Tue Mar 13 10:52:54 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 13 Mar 2001 10:52:54 -0000 Subject: [Doc-SIG] Docststring markup process In-Reply-To: <200103121950.f2CJoHp19946@gradient.cis.upenn.edu> Message-ID: <006e01c0abab$c1beacf0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > As far as I can tell, the current problems lie less in not being able > to agree on a markup format, and more in not being able to define > one. STminus is working to fix that, but it will take > several iterations before we have something that would be worthy > of putting in a PEP. I actually would like to see the PEP for STpy I'm working on and the (eventual) PEP for STminus *both* accepted - that is, it's useful to have the "full blown Common Lisp" approach for doing Python specific work, but it's also useful to have the "tight and well defined Scheme" approach for writing stuff that you *know* will "compile" under both systems... > > Of course we can discuss the various proposals here, but it's a big > > step forward to get them written up and all in one place > for comparison. Which is why I have changed my mind (gosh, Doc-SIG does that to me) and decided the PEP thing *is* a good idea (don't you just hate those Python people who change your mind?) > I would appreciate it if we can make the deadline far away enough > that we can have a real formal definition for whatever it is > we're proposing. We're not aiming at Python 2.1 with any of this anyway. I say let things slide for at least another fortnight before thinking of even thinking of a deadline... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Tue Mar 13 11:03:44 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 13 Mar 2001 11:03:44 -0000 Subject: [Doc-SIG] reserved characters In-Reply-To: <200103121953.f2CJrgp20338@gradient.cis.upenn.edu> Message-ID: <006f01c0abad$456f2e20$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > I'm thinking of adding a number of "reserved characters" to STminus, > that specific versions of ST may decide to use or not use. They > would redefine their own ReservedCharacter production accordingly.. > So: > 1. Is this a Good Idea? Well, ST's *do* have reserved characters, sort of, but the problem is that they're not reserved in all circumstances. So, in STpy, '#' is special, but only in the context of after a space or beginning of line and before a non-space that isn't '#', and so on (i.e., when it is acting as a quotation character). I think that for STminus's purposes, it might make sense to make characters reserved, *perhaps*, but for the "full fledged" ST's it doesn't (they're much more Perl like in this respect, whether Perl works like that or not (I don't know) since they assume that people can cope with the meaning of a character changing depending on its environment/use). > 2. What characters should be included? Clearly at least '#' for > STpy. Other possibilities are backquote, at sign, exclamation > mark, etc. I'd like to reserve as few as we can get away with, > since the more characters are reserved, the less useful a > program written to work on *any* StructuredText document can > be. ZWiki (and other Wikis?) use an initial '!' to mean "not a Zwiki reference" - so for instance StructuredText would be a reference, but !StructuredText would not. '[' and ']' are "force a reference" characters in Zwiki, and will be used for similar purpose in STpy. But again, it depends on context. Hmm - on the whole I'm not sure if this helps. The only thing I *might* give away is '@', since it is rarely used in text, and not meaningful in Python (but you discuss that in another email) Tibs typing-as-we-go -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Tue Mar 13 11:18:28 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 13 Mar 2001 11:18:28 -0000 Subject: [Doc-SIG] Evolution of library documentation In-Reply-To: <200103122038.f2CKc3p26323@gradient.cis.upenn.edu> Message-ID: <007001c0abaf$53ef1da0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > I think that for the most part, I tend to side with Tibbs on this.. although I think you may have expressed it better... > Note that that is generally exactly the type of documentation you > want in a reference manual. I think that the idea of building > the reference manuals from docstrings makes a lot of sense. I do > *not* think that including tutorials, howtos, big examples with > explanations, etc. belong in the docstrings. Well, a reference manual is sometimes at a higher level than what I put in docstrings. But maybe I should be prefixing more methods with '_'. > This is pretty much completely a digression, but I figured I'd chime > in. Regardless of how easy it is to download CVS or LaTeX, it is > much easier to learn how to *use* CVS than to learn how to use > LaTeX. Given a knowledgable teacher, you can learn everything you > need to know about CVS in an hour. I'm not sure anyone ever learns > everything they need to know about LaTeX. :) Well, point taken, although *not* given a knowledgable teacher, I found getting started with CVS to be a non-starter, whereas I got up to speed with TeX and LaTeX rather fast without any help 15 or 18 years ago (when it was a lot more difficult). Maybe it depends on one's mind and interests... (and I shan't pretend that one ever learns all one needs to know about LaTeX until one implements bits of it, which *I* certainly haven't done - personally I prefer to use TeX neat if I want to do my own stuff) > I think that we should leave the organization of "grander scope" > documentation to a different project.. (Of course, it's still > an important project.) Yes, yes, yes (indeed, I thought that's what was already being done, quietly in the background, by our Mr Drake) > I think it might make sense to reserve one character, or maybe 2, for > advanced markup. (We would also want to be able to backquote it > somehow, but we'll leave discussion that for later..). So, > for example, > we could say that '@' is reserved for advanced markup, and then it > can be used by people who: > 1. want more advanced features > 2. are willing to use a "real" markup, which is more "obtrusive" > and difficult to read/write. I'm willing to reserve '@' for later use (the whole issue of how one quotes things in ST is a difficult one, single quote itself is enough of a problem. The "true ST" approach, I think, is to say "borderline case - punt it - we're not that complex", which *may* well be the correct answer). BUT whilst I don't mind reserving it, I'll probably oppose using it! (but allowing you to reserve it postpone the argument, which is a Very Good Thing, and it gives us a *marker* for having postponed the argument) > I think that we should limit how much more complex "basic" ST gets, > as much as possible.. STpy is almost as complex as I want it to get already - I want to add in [references] and then feature freeze. Which would have become clear in the alpha (he whines) > I agree that it doesn't make sense to make up our own markup > langauge.. > So maybe reserve (non-backslashed) '<' and '>' and use XML for > advanced markup? I don't want to reserce '<' and '>'. If you want to placemark that in your head, then place hold it as "I'll propose, some time in the indefinite future, that we markup an XML style entity using something like, for instance, just an idea:: @fred ... @/fred being equivalent to:: ... an we'll sort it our later on". This leaves the heated arguments about whether we *really* want XML, or DocBook, or what, and whether those spaces matter, and whether it should *actually* be '@', until the future, where they belong. Keeping a marker for them (so we don't forget to discuss them in the future) is a good thing. Worrying about them now isn't. (and also, lots of things "fall out in the wash" from things like ST - one sometimes finds that, later on, one already *has* what one wanted, anyway). > Of course, depending on what we decide we need, we may be able to get > away with much less. For example, define things like:: > > Parameters: > p1 -- foo > p2 -- bar > > To have special meaning in the context of docstrings. Well, actually it's "Arguments". And the special meaning isn't *enforced* in the current implementation or documentation, but it may be in the future (so much to do, so little whatever). > Any standard > (STpy-like) ST will just render them as a heading with a list. > You can also define special forms like:: > > author -- Edward Loper > version -- 2.71828 That's:: Author: Edward Loper Version: 2.71828 and it doesn't work yet (because ':' doesn't yet start a paragraph) - it may or may not work in the alpha release. > That takes care of most of the structural-type markup, and still looks > ok if you just read it, or parse it with non-docstring-specific ST > tools.. Strangely enough, this was David Ascher's reason for coming up with the idea. Look in the doc-sig archives around November/December 1999 for talk about these issues. > Then you just have to worry about syntax for inline markup. I don't > have any great ideas there, other than having a whole class of > "advanced inline markup" tags like @somethingorother(...) Given the '@' reserved, we can leave that... Tibs (two more to go) -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Tue Mar 13 11:27:34 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 13 Mar 2001 11:27:34 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103130149.f2D1n8p01283@gradient.cis.upenn.edu> Message-ID: <007101c0abb0$99519bb0$f05aa8c0@lslp7o.int.lsl.co.uk> > I'd be happy to write up a PEP, but I'm curious what things people > think it should include. In particular, which of the following > should it include: > 1. The definition of a particular markup language? > 2. Specification of the *semantics* of a markup langauge (as > opposed to just its syntax) > 3. Specification of what is "appropriate" to put in a docstring? > 4. Specification of the "intended semantic content" of various > "slots" like docstrings and strings immediately following > docstrings? > 5. Specifications of tools for docstrings? Possibly of tools > that would eventually be included in the standard library? > 6. whatever else you can think of.. I'm tempted to answer, but time is running out, and in fact the easiest answer is probably to write my own PEP. > Some of these questions require further work to answer (such as 1, > where no one can currently give a good definition of any version > of StructuredText).. Others will probably involve some agreement > by the community.. I'm sure the community will love to argue... (We do realise, don't we, that when we put a PEP on the main python-list, all the arguments from the last 5 years are going to be endlessly rehashed? Yes, I thought we did.) > I think it's important to distinguish the tools that process > docstrings from any attempt to define what should go *in* > docstrings (or similar places), whether that means what type > of information or what type of markup. The PEP should address > what goes *in* docstrings, but shouldn't necessarily have much > to say about the tools that process the docstrings. (Leave > those for a standard library extension (do those get PEPs?). Actually, my experience is that one can't do that separation until one has started on the tool - otherwise the tool never gets written (because of the community arguing referred to above). Been there, done that. Also, ST is sort of difficult to think about without an implementation to throw strings at and see how they behave. > My favorite suggestion is to just use top-level description list items > and top-level headings with a single level of description list items > to store meta-data.. and to have "reserved" keys for description lists > (in these contexts). E.g.:: > > def primes(n): > """ > Return a list of all the prime numbers from 1 to n, exclusive. > > Parameters: > n -- The upper limit for the prime numbers to return. A > prime number will be returned if and only if it is > less than 'n'. > > Types: > n -- int > > ReturnType -- 'list' of 'int' > Version -- 1.0 > Author -- Edward Loper > See -- #other_primes# > """ > ... > > Also, I think that *all* of these special marked meta-data fields must > be optional. (Of course, if the user wants to use a program that > checks to make sure that ReturnType is defined for every method, they > can.. but it's not required in general). What he said. Twice (although he's got some of the markup wrong, but that's my fault, not his). > I therefore propose the following: > 1. STdoc strings appear after __doc__ strings, as I said before. I've disagreed with this elsewhere - we'll have to differ for the moment. > 2. For now, these strings are thrown away by the compiler Hmm. > 3. At some future date, the compiler could be modified so that, at > user option, it would produce ".pyd" files as well as ".pyc" > files. These contain all the STdoc strings from the file, and > can be accessed via the interpreter somehow. Python would *not* > format them, it would just copy them. Maybe create a dictionary > from identifier name to string, and pickle it. pyo files throw away docstrings already. It's this time machine thing, you see. > >> 6- the documentation parsing tools should be capable of > producing output in > >> many formats (manpages, plain text, html, latex, for > a start), > > definitely, although this may take some time to implement. I > would add > "dynamic navigation of docs from within the python interpreter" to > the list of possible "outputs." Time machines again. HappyDoc and pydoc do lots of what is wanted, and seem both to be expanding (in wonderfully different directions - btw, have you *looked* at inspect.py - it's ever so neat). > Possibly in a different file.. I find Tibb's arguments > pretty convincing.. You see why I'm willing to differ with him on other issues? Can't but help liking someone with that much sense (now, I just need to convince him of all my other views...) > So I'll try to write up a PEP when I get a chance. It sounds like > Tibbs might write a proposal too. I think that Tibbs and I seem > to have similar views on a lot of issues, so if we want diversity > in our PEPs, maybe someone else should work on one too. :) (Of > course, this sort of seems like redundant work, but I guess it's > for the best or something) Actually, I think our PEPs will be sufficiently different in aim to both be useful (and we *do* disagree on some things, flippancy aside). But, as I said elsewhere, I also suspect that I will want *both* our PEPs to be adopted. (different aims can produce different useful things. That's why I keep referring to HappyDoc and pydoc - superficially similar, but actually rather different in aim and implementation) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Tue Mar 13 11:36:10 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 13 Mar 2001 11:36:10 -0000 Subject: [Doc-SIG] two simple, but important questions In-Reply-To: <01031220543800.20079@lucy> Message-ID: <007201c0abb1$cd4312e0$f05aa8c0@lslp7o.int.lsl.co.uk> Tavis Rudd wrote: > I'm new to this SIG. 'Sok - more people means more brains to think useful thoughts. > After reading the archives and you're already coming across well! > I can't find any consensus > about two critical issues that I feel should be at the > forefront of any documentation PEP: There's probably a good reason for that.. > issues that need to be addressed before > the syntax of the markup language. the most obvious of which *might* be a disagreement with that statement, of course... > ONE: > ----- > Should the API documentation be written solely in docstrings > that are accessible at runtime through __doc__? (option a) Well, this one is definitely one to leave for later arguments (see, I said) > Or should more detailed documentation (i.e. the stuff in > structured text) be > written in a form that is not compiled into the byte-code, > thus sparing the > memory and processing overhead and keeping the namespace > clean? (option b) I think this has been answered elsewhere. > One way of doing this is to place as second """string > literal""" after the > __doc__ docstring. An idea I revile (sorry, I'm in a hurry) > Or, should it be kept separate from the __doc__ docstring > but still be imported into the runtime environment (under a name like > __fdoc__), at the cost of having to change the way python > works internally? > (option c) Heck, I hate that too. > TWO: > ---- > Should the documentation tools (a) import and run module code > in order to > produce the documentation or (b) should they follow the safer > route of > parsing the documentation out of the raw source code (.py)? That's easy, the answer is yes. More precisely, it depends on what you want to do. pydoc, to meet its mandate, has to do one. HappyDoc (and tools like it) have to do the other. > MY PERSPECTIVE: > -------------- > ONE: > I'd argue for option b, because: My arguments are elsewhere, but in summary: docstrings are good - put documenation in them - whatever you *want* to put in them (heh, getting *any* documentation is a win) documentation is good - if you have more than will fit in a docstring, put it in a text file. See, I think I'm a traditionalist. Of course, I won't moan if you want to do literate programming instead, but that's a different kettle of fish. > - it requires absolutely no changes to python's internals and > is backwards compatible --->> therefore it can be implemented today not > when python 2.? is released as in option c. Heh - all the software we've been writing works in 1.5.2... > - it keeps the namespace clean us too > TWO: > For security and performance issues alone I argue for option b. Unfortunately (well, not really), you've already got both options, and option (a) is BDLF approved. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Tue Mar 13 16:52:25 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Tue, 13 Mar 2001 11:52:25 EST Subject: [Doc-SIG] two simple, but important questions Message-ID: <200103131652.f2DGqQp23534@gradient.cis.upenn.edu> I don't have strong opinions on Travis's questions, and I think that they do not necessarily have to be addressed in the PEP.. But I'll give my thoughts on them anyway.. Travis Rudd said: >> Should the API documentation be written solely in docstrings that are >> accessible at runtime through __doc__? (option a) >> >> Or should more detailed documentation (i.e. the stuff in structured text) be >> written in a form that is not compiled into the byte-code, thus sparing the >> memory and processing overhead and keeping the namespace clean? (option b) >> One way of doing this is to place as second """string literal""" after the >> __doc__ docstring. >> >> Or, should it be kept separate from the __doc__ docstring >> but still be imported into the runtime environment (under a name like >> __fdoc__), at the cost of having to change the way python works internally? >> (option c) Given the option of producing .pyo files, I would lean towards option (a). The problem with your solution here (using a tool to read the .py file) is that it requires that everyone have the .py files for all the modules they're using. But sometimes it's useful to distribute just the .pyc file.. The main advantage I see of putting formatted docs anywhere other than the docstring is that some existing programs (Zope?) already use the docstrings for specific purposes, that are incompatible with what we would propose... >> Should the documentation tools (a) import and run module code in order to >> produce the documentation or (b) should they follow the safer route of >> parsing the documentation out of the raw source code (.py)? In general, if you want the docs on a module, then you should trust it enough to import it anyway. Presumably if you're trying to get its docs, it's because you want to *use* it... That said, there are some definite advantages of processing the source code.. e.g., you can list default args as what they really are, instead of something like at 0xabcdef>. and you can tell what's imported and what's not. But I would say that both options should be available to tools. It may turn out that tools that implement one option will be more successful, and so will become widely accepted and everyone will forget about the other option. Or it might be that the best approach is to use both, when possible.. --- As a final note, I think that if this PEP is going to have any chance of succeeding, it needs to have a relatively small and well-defined scope. So maybe we can divide it in two: PEP A: a standard markup/string format for "formatted documentation strings" PEP B: a proposal on how "formatted documentation strings" should be accessed, where they should be stored, etc. -Edward From edloper@gradient.cis.upenn.edu Tue Mar 13 16:55:43 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Tue, 13 Mar 2001 11:55:43 EST Subject: [Doc-SIG] PEP 216 Message-ID: <200103131655.f2DGthp23889@gradient.cis.upenn.edu> Just out of curiosity, how is this new proposed PEP going to relate to PEP 216 ("docstring format")? Will it replace it? If not, then maybe we should just work on extending PEP 216? If you're interested in this PEP stuff, and haven't read PEP 216, you should probably go read it..: http://python.sourceforge.net/peps/pep-0216.html Does anyone know if Moshe Zadka still actively working on this? -Edward From edloper@gradient.cis.upenn.edu Tue Mar 13 17:17:18 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Tue, 13 Mar 2001 12:17:18 EST Subject: [Doc-SIG] suggestions for a PEP Message-ID: <200103131717.f2DHHIp27124@gradient.cis.upenn.edu> Tibs wrote: >> 1. Decide on STpy (or an ST variant), with minimal >> extensions >> 2. Produce an application that parses it >> 3. Get it accepted, and get people using it >> 4. THEN, once we've changed the culture, spring >> our wonderful scheme for semi-formal markup >> on them, that allows them to extract special >> information. What do you mean by semi-formal markup? If it's just fields (i.e., global structure markup) then somewhere around 2, I would also add: * Tell people what the semi-formal markup scheme will most likely look like, but say that it's not implemented yet. That way, people can add semi-formal markup, and when it gets implemented, they magically get prettier outputs. If you mean coloring (i.e., local structure markup), then I'd say definitely leave it for later. :) >> a. I don't actually believe that you are going to get most Python >> programmers to DO semi-formal markup for you. I *do* believe there's a >> good chance we'll get many people to write at least a little human >> readable text. So guess what I'm after... I think that a bunch of programmers would be willing to do global markup (special places to describe arguments, etc). I don't think they would do formal coloring markup.. >> http://www.tibsnjoan.co.uk/STpy.html#taggedparas >> which describes things like:: >> >> Author: Guido van Rossum >> >> and:: >> >> Arguments: >> fred -- this is a useless name for an argument I agree with the idea, but I strongly think that things like author should be description list items. Advantages: * it requires no changes to current STpy * it's more compatible with other STs * it seems conceptually "cleaner" * we don't need to give special meaning to yet another character (":") * it's *much* easier to describe a general mechanism, so other people can extend it for their own tools. * if they do something wrong, (e.g., put unordered list items below "Arguments:"), then we can just format it like normal ST. (and give a warning, if we're nice) Problems will come up with your formalism with lines like: Author: Mr. bob frank # does the . mean it's a got a sentence? Things to do: eat, sleep # how does this get treated? Unless you can come up with a compelling reason for using:: Author: Guido instead of:: Author -- Guido I think we should go with the latter. >> And although it says there that >> it should be left for second phase implementation, the start of support >> is already in docutils. I think it will be helpful to people if we can get this streight now. That way, they can use it while documenting code in the interim period. But it shouldn't get in the way of getting everything else done, if we can't manage to figure it out now. >> erm - sorry? if you want your inline docs to be available to a browser >> (and I *do*) then they've pretty well got to be around somewhere! (I assume) he means he doesn't want things like __author__ and __version__ at the top level of a module. >> dosctrings inherit like any other value, surely? Um.. yes, but that's not always what you want. Basically, if I define:: class A: def f(x): "this if f(x)" return 1 class B(A): def f(x): return 2 Then I might want B.f to inherit its docs from A.f. This would be especially nice for things like UserList, so my classes will have docs for its methods without me having to duplicate explanations. But in a sense, this is a tool issue.. Although, it will affect whether people tend to copy comments for overriden functions.. -Edward From edloper@gradient.cis.upenn.edu Tue Mar 13 17:18:33 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Tue, 13 Mar 2001 12:18:33 EST Subject: [Doc-SIG] Formalizing StructuredText (yeh!) Message-ID: <200103131718.f2DHIXp27257@gradient.cis.upenn.edu> Ken Manheimer said: >> Great boon that you're making tests - sounds like you're encountering bugs >> or unwanted looseness in STNG. We'll want to fix such. I've got about 160 test cases so far, and expect that number to grow. In playing with STminus, I've often been surprised how many of the test cases actually catch errors that I make. :) But I'd welcome more test cases from other people.. More on that when I post the sttest module (later today?)... >> I'm pretty sure we will want to fix bugs and track a good base standard >> (which is where STminus seems to be heading) in STNG. The problem is >> going to be finding time to do so - at least for the next few weeks, all >> the likely suspects are inundated - but we are interested, and will >> eventually make time to track. (In case it needs saying, patches would be >> welcome...-) Unfortunately, time is always a problem. But I'm very glad to hear that STNG is interesting in working with us to tighten up the definitions of StructuredTexts.. I don't think I'll have time to make any patches though (much less read the STNG source code to figure out where they would even go)... I plan to spend most of my time on STminus and the docstring PEP.. -Edward From edloper@gradient.cis.upenn.edu Tue Mar 13 17:29:39 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Tue, 13 Mar 2001 12:29:39 EST Subject: [Doc-SIG] Docststring markup process In-Reply-To: Your message of "Tue, 13 Mar 2001 10:52:54 GMT." <006e01c0abab$c1beacf0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103131729.f2DHTdp28610@gradient.cis.upenn.edu> > I actually would like to see the PEP for STpy I'm working on and the > (eventual) PEP for STminus *both* accepted - that is, it's useful to > have the "full blown Common Lisp" approach for doing Python specific > work, but it's also useful to have the "tight and well defined Scheme" > approach for writing stuff that you *know* will "compile" under both > systems... Of course, even clisp has a formal definition.. ;) I'd like STminus to expand to eventually be able to provide a formal definition of STpy. Of course, it would be an underspecified definition, so it wouldn't say what to do with things like:: This **is a *pretty **messed *up **docstring. But it should still describe every *reasonable* use of STpy... Now, that may not be an entirely reasonable goal.. but only time will tell. :) But, at any rate, the idea of doing 2 PEPs is to see which one people like better. I'm not planning on proposing STminus001 as a formatting convention. I think that's not very reasonable. But Maybe STpyminus099 (i.e., STminus with py extensions (such as #..# and list items without blank lines), version 99). Well, I guess I'll have to wait to see what you write up for your PEP to decide.. -Edward From edloper@gradient.cis.upenn.edu Tue Mar 13 17:35:44 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Tue, 13 Mar 2001 12:35:44 EST Subject: [Doc-SIG] reserved characters In-Reply-To: Your message of "Tue, 13 Mar 2001 11:03:44 GMT." <006f01c0abad$456f2e20$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103131735.f2DHZip29411@gradient.cis.upenn.edu> > Well, ST's *do* have reserved characters, sort of, but the problem is > that they're not reserved in all circumstances. So, in STpy, '#' is > special, but only in the context of after a space or beginning of line > and before a non-space that isn't '#', and so on (i.e., when it is > acting as a quotation character). The idea would be that specific ST's would be allowed to use reserved characters if they want, or remove them from the reserved character list if they want. So in STminus, we would have:: ReservedChars = # | @ And in STpy, we would have:: ReservedChars = # This production would take care of making sure that those characters don't appear in normal text. > I think that for STminus's purposes, it might make sense to make > characters reserved, *perhaps*, but for the "full fledged" ST's it > doesn't (they're much more Perl like in this respect, whether Perl works > like that or not (I don't know) since they assume that people can cope > with the meaning of a character changing depending on its > environment/use). Certainly. Each full-fledged ST would probably un-reserve the characters they don't use. The main idea here would be to add a "hook" for us to add in future things like advanced markup, if we later decide we want to. The potential problem is: 1. we make our markup language 2. everyone loves it, it gets used everywhere 3. everyone decides that advanced markup is a good idea after all 4. no one wants to make changes that will mess up all the docs that they've already written. > '[' and ']' are "force a reference" characters in Zwiki, and will be > used for similar purpose in STpy. But again, it depends on context. Hm. I'm not sure I like the sound of that. Care to elaborate? > The only thing I *might* give away is '@', since it is rarely used in > text, and not meaningful in Python (but you discuss that in another > email) So if we decide we want reserved characters, we might limit it to this. -Edward From edloper@gradient.cis.upenn.edu Tue Mar 13 17:37:17 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Tue, 13 Mar 2001 12:37:17 EST Subject: [Doc-SIG] quoting Message-ID: <200103131737.f2DHbHp29600@gradient.cis.upenn.edu> Could someone point me to an explanation of why we *don't* want to use backslashes for backslashing characters? :) e.g., \* for a literal asterisk. It seems so much like the natural thing to do, and surely someone who's coding in python will be familiar with the convention of backslashing characters.. -Edward From tavis@calrudd.com Tue Mar 13 17:58:38 2001 From: tavis@calrudd.com (Tavis Rudd) Date: Tue, 13 Mar 2001 09:58:38 -0800 Subject: [Doc-SIG] two simple, but important questions In-Reply-To: <200103131652.f2DGqQp23534@gradient.cis.upenn.edu> References: <200103131652.f2DGqQp23534@gradient.cis.upenn.edu> Message-ID: <01031309583800.00505@lucy> Edward suggesed: > scope. So maybe we can divide it in two: > PEP A: a standard markup/string format for "formatted documentation > strings" > PEP B: a proposal on how "formatted documentation strings" should > be accessed, where they should be stored, etc. Or maybe it would be best to just have one PEP, with Edward's B (my questions, etc.) kept very small (a paragraph should do it once there's agreement here). After reading Tibb's and Edward's responses to my questions, I'm starting to lean towards option a (keeping everything in the docstring) for the simple reason that it's the path of least resistance (/debate). However, option b (having the option of a second string literal that follows the docstring and is discarded by python at complile-time) does provide the following things that option a can't: - the ability to gracefully document module and class data attributes (constants etc.) as Marc-Andre Lemburg proposed in PEP 224 - a means for the verbose to document to their heart's content without worrying about increasing memory usage and load-time (but is this ever really going to be significant??) - a means to include the Library reference documentation in the same file, as Ping proposed Of course, if the tools were to give module creators the option of putting extended documentation in like this, it doesn't mean that everyone must do it this way. If you, as a coder, don't like the idea of discardible string literals after the __doc__s, stick with the standard docstring. > In general, if you want the docs on a module, then you should trust it > enough to import it anyway. Presumably if you're trying to get its > docs, it's because you want to *use* it... My concern here was related to indexes/searches of all available modules on a system, and to cases where module creators haven't followed the if __name__ == '__main__' convention. But after thinking a bit harder, I realize that this is really an implementation issue for the tools and that security risks can be kept minimal with either route anyway. Maybe the PEP should explicitly mention this as being an implementation detail for the tool makers. Cheers, Tavis p.s. Edward there's no Tr in my name, just Ta From klm@digicool.com Tue Mar 13 18:20:08 2001 From: klm@digicool.com (Ken Manheimer) Date: Tue, 13 Mar 2001 13:20:08 -0500 (EST) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Message-ID: > To: doc-sig@python.org > Subject: Re: [Doc-SIG] suggestions for a PEP > Date: Mon, 12 Mar 2001 20:49:08 EST > From: "Edward D. Loper" > > [...] > So I'll try to write up a PEP when I get a chance. It sounds like > Tibbs might write a proposal too. I think that Tibbs and I seem > to have similar views on a lot of issues, so if we want diversity > in our PEPs, maybe someone else should work on one too. :) (Of > course, this sort of seems like redundant work, but I guess it's > for the best or something) I may be misguided here, but my impression is that it's not a goal to have more PEPs. Rather, the idea is to allow for multiple PEPs that convey competing viewpoints when competing viewpoints exist. If you diverge just in marginal particulars, you can present the particulars as alternatives, with explanation about their relative merits. In general, if it's suitable for you and tibbs to do a joint PEP, and you'd be comfy with it, it sounds like the PEP might be the stronger for it. Ken Manheimer klm@digicool.com From edloper@gradient.cis.upenn.edu Tue Mar 13 23:07:45 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Tue, 13 Mar 2001 18:07:45 EST Subject: [Doc-SIG] Evolution of library documentation In-Reply-To: Your message of "Tue, 13 Mar 2001 11:18:28 GMT." <007001c0abaf$53ef1da0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103132307.f2DN7kp12393@gradient.cis.upenn.edu> > Well, a reference manual is sometimes at a higher level than what I put > in docstrings. But maybe I should be prefixing more methods with '_'. It may make sense to have some place other than docstrings for reference manual docs, if someone chooses to use it. But I think that docstrings could give a very good "default" reference manual entry. Of course, I tend to be very thorough in my commenting & documenting. :) > > I think that we should leave the organization of "grander scope" > > documentation to a different project.. (Of course, it's still > > an important project.) > > Yes, yes, yes (indeed, I thought that's what was already being done, > quietly in the background, by our Mr Drake) Good to hear. > (the whole issue of how one > quotes things in ST is a difficult one, single quote itself is enough of > a problem. The "true ST" approach, I think, is to say "borderline case - > punt it - we're not that complex", which *may* well be the correct > answer). I don't see why it has to be a difficult issue, but hopefully someone will explain it to me. :) But I disagree that the "true ST" approach is to say "borderline case - punt it." I think that's the approach of most *current implementations* of ST, not of ST itself. If ST is to become accepted and useful, we *do* need to define borderline cases, or at least make them explicitly undefined. It really sucks to try to use a language/markup that keeps changing under your feet, and is inconsistant between tools. :) > BUT whilst I don't mind reserving it, I'll probably oppose using it! > (but allowing you to reserve it postpone the argument, which is a Very > Good Thing, and it gives us a *marker* for having postponed the > argument) Hm.. so unless anyone objects, we'll reserve it for now, with the understanding that it may become unreserved later, or it may become meaningful.. > STpy is almost as complex as I want it to get already - I want to add in > [references] and then feature freeze. Which would have become clear in > the alpha (he whines) I'm still unclear about this [references] thing; explain it? But I do agree that ST is complex enough. (btw, why do we need 3 different unordered list bullets? That seemed like enormous overkill to me, and getting rid of 'o' as a bullet would solve some problems with foreign languages.. Any chance that we could just standardize on *either* '*' or '-'? I guess the STNG people won't like that though...) > Well, actually it's "Arguments". And the special meaning isn't > *enforced* in the current implementation or documentation, but it may be > in the future (so much to do, so little whatever). I think of arguments as values you give a function, and parameters as the slots to receive them. But I mainly chose "parameters" to be consistant with javadoc etc. Doesn't really matter to me what we call it. > > author -- Edward Loper > > version -- 2.71828 > > That's:: > > Author: Edward Loper > Version: 2.71828 > > and it doesn't work yet (because ':' doesn't yet start a > paragraph) - it may or may not work in the alpha release. As I argued in a previous email, I really think it should be description list items, but I'll wait for you to argue your case.. :) > > Then you just have to worry about syntax for inline markup. I don't > > have any great ideas there, other than having a whole class of > > "advanced inline markup" tags like @somethingorother(...) > > Given the '@' reserved, we can leave that... Agreed. Any further inline markup will be left for another day. -Edward From edloper@gradient.cis.upenn.edu Tue Mar 13 23:14:59 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Tue, 13 Mar 2001 18:14:59 EST Subject: [Doc-SIG] formalizing StructuredText: issues Message-ID: <200103132314.f2DNExp13135@gradient.cis.upenn.edu> 2 issues have come up recently in my attempts at formalizing ST: 1. what is the most elegant way to capture indentation constraints? EBNF+la can't do it very well. This is important when I try to write productions for literal blocks (ones preceeded by '::'), since where they end depends on indentation. Perhaps use rules like:: literal_block = (?P S+) line NL ((?P=indent) S+ line NL)+ ? 2. Is there an elegant way to let people say things like 'TestSet's for the plural of 'TestSet' (instead of 'TestSets')? Currently, the second "'" in 'TestSet's would be ignored, as it would be with what's or it's. This isn't a problem if we use '#': #TestSet#s. But I could see it confusing people.. Any ideas appreaciated. -Edward From edloper@gradient.cis.upenn.edu Wed Mar 14 05:33:37 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 14 Mar 2001 00:33:37 EST Subject: [Doc-SIG] Test set! Message-ID: <200103140533.f2E5Xcp24096@gradient.cis.upenn.edu> I've finished the first version of my StructuredText test suite module. Excerpted from the module's documentation: This module defines an organized, comprehensive set of test cases for STminus, STNG, and STpy. Each test case consists of a self-documenting StructuredText string. The test cases are hierarchically organized into 'TestSets'. To just look at what tests are currently contained in the test set, look at STminus002's test results, at: http://www.cis.upenn.edu/~edloper/pydoc/stminus002-test.html It may seem like I've included too many test cases, but I've surprised myself at how often test cases will unexpectedly stop working when I make minor changes to STminus. Any suggestions for additional test cases are most welcome. I'll be adding test cases as time goes by. If you want to actually use the test set, download its module from: http://www.cis.upenn.edu/~edloper/pydoc/sttest.py The module defines 2 useful variables: everything -- a 'TestSet' containing all tests. test_hierarchy -- a hierarchy containing all tests. The easiest way to use it is to evaluate:: everything.tests('+STNG') or:: everything.tests('+STpy') Which will return a list of strings, containing the test cases for each variant (currently these will return the same thing). If you want to do regression testing, you should: 1. run the test suite, output to a format of your choice, and check all the answers by hand. 2. save said output to a file 3. to do regression testing after you change your implementation, run the test suite again, and diff it against your saved "known-correct" results. -Edward From edloper@gradient.cis.upenn.edu Wed Mar 14 07:06:57 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 14 Mar 2001 02:06:57 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Tue, 13 Mar 2001 13:20:08 EST." Message-ID: <200103140706.f2E76vp05314@gradient.cis.upenn.edu> > I may be misguided here, but my impression is that it's not a goal to have > more PEPs. Rather, the idea is to allow for multiple PEPs that convey > competing viewpoints when competing viewpoints exist. I agree. Unfortunately, I don't think there are too many people out there who have competing viewpoints and are willing to take the time to write a PEP right now. > If you diverge just > in marginal particulars, you can present the particulars as alternatives, > with explanation about their relative merits. In general, if it's > suitable for you and tibbs to do a joint PEP, and you'd be comfy with it, > it sounds like the PEP might be the stronger for it. For now, Tibs and I will do separate PEPs. I think that working alone, we're likely to cover different aspects of the problem. For that reason, I don't want to see a draft of Tibs' PEP until I'm done (and I probably won't show him mine either. :) ) If our drafts are compatible, and we feel like it, we may then merge our drafts into one PEP. At the very least, we'll probably steal ideas from each other's PEPs. At the end of all that, we'll have either 1 or 2 PEPs (or more if other people are motivated).. We can then try to marshall acceptance and go forward with it/them. That's my plan, anyway.. -Edward From tony@lsl.co.uk Wed Mar 14 10:08:04 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 14 Mar 2001 10:08:04 -0000 Subject: [Doc-SIG] PEP 216 In-Reply-To: <200103131655.f2DGthp23889@gradient.cis.upenn.edu> Message-ID: <007d01c0ac6e$a91c0620$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Loper asked: > Just out of curiosity, how is this new proposed PEP going to relate > to PEP 216 ("docstring format")? Will it replace it? If not, then > maybe we should just work on extending PEP 216? If you're interested > in this PEP stuff, and haven't read PEP 216, you should probably go > read it..: > > http://python.sourceforge.net/peps/pep-0216.html As I see it, PEP 216 currently says that the docstring format adopted will be STNG (with an implicit assumption that it will be a variant of that). What we are now talking about, and what was asked for, is to define that variant (or several different variants to be voted on) via the PEP mechanism. So PEP 216 stands, albeit with very slightly amended wording, and we gain one or more new PEPs. Which makes organisational sense to me. > Does anyone know if Moshe Zadka still actively working on this? I assume he is keeping at least an occasional eye on it, and if he doesn't chime in when I've got my PEP (and alpha release) ready, I'll actually email him to ask him to change the text of PEP 216 slightly, and add a reference to the new PEP (and then, to any others in the future). (don't forget the "Python help" PEP is relevant too, as well as the "attribute docstrings" PEP) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Wed Mar 14 10:21:12 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 14 Mar 2001 10:21:12 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103131717.f2DHHIp27124@gradient.cis.upenn.edu> Message-ID: <007e01c0ac70$7e66e100$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: (well, actually some of it is me) > >> a. I don't actually believe that you are going to get most Python > >> programmers to DO semi-formal markup for you. I *do* > believe there's a > >> good chance we'll get many people to write at least a little human > >> readable text. So guess what I'm after... > > I think that a bunch of programmers would be willing to do global > markup (special places to describe arguments, etc). I don't think > they would do formal coloring markup.. No, no - I meant in *new* code, rather than in old. I want documentation in new code as well! (as for the standard library - we have precedent for farming out bits of the code to be worked on by volunteers in the original addition of docstrings. I bet much of it will be pretty close to OK anyway). > I agree with the idea, but I strongly think that things like author > should be description list items. Advantages: > * it requires no changes to current STpy > * it's more compatible with other STs > * it seems conceptually "cleaner" > * we don't need to give special meaning to yet another > character (":") > * it's *much* easier to describe a general mechanism, so > other people can extend it for their own tools. > * if they do something wrong, (e.g., put unordered list items > below "Arguments:"), then we can just format it like normal > ST. (and give a warning, if we're nice) > > Problems will come up with your formalism with lines like: > > Author: Mr. bob frank # does the . mean it's a got a sentence? > Things to do: eat, sleep # how does this get treated? > > Unless you can come up with a compelling reason for using:: > > Author: Guido > > instead of:: > > Author -- Guido > > I think we should go with the latter. Aagh! Quick, do a "from __future__ import docutils" 'cos that's all being addressed in the alpha release and *it's already coded*. OK. The rules for "paragraph labels" (I changed the term) are fairly simple. Basically, there are two dictionaries:: label_dict = { "Label" : "xml-label" } and:: label_validate = { "Label" : ["para","dlist"] } The first says that we will recognise text like:: Label: this is a single line labelled paragraph and:: Label: This is the labelled paragraph and:: Label: Key -- the first labelled item The second dictionary says that all of those examples are legitimate (when validation is switched on). I decided *not* to complicate matters by allowing:: Label: Only indentation makes this look like a paragraph. because there are already enough rules about when paragraphs start. This fits in reasonably well with David Ascher's original ideas (thrashed out at the end of 1999), and to my eye the use of a colon is more natural than overloading descriptive lists. More explanation can (please) wait until the alpha release, which I *hope* will be end of next week, with the PEP. > >> And although it says there that > >> it should be left for second phase implementation, the > start of support > >> is already in docutils. > > I think it will be helpful to people if we can get this streight now. Yes, but can now be next week, 'cos then I won't have to explain things *quite* so many times, please? > >> dosctrings inherit like any other value, surely? > Um.. yes, but that's not always what you want. Basically, if > I define:: > > class A: > def f(x): > "this if f(x)" > return 1 > > class B(A): > def f(x): > return 2 > > Then I might want B.f to inherit its docs from A.f. This would be > especially nice for things like UserList, so my classes will have > docs for its methods without me having to duplicate explanations. Then assign them. __docs__ is a perfectly valid "slot" to assign to (and is the precursor to the whole idea of function values - see the relevant PEP). See, as I said, time machines... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Wed Mar 14 10:23:18 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 14 Mar 2001 10:23:18 -0000 Subject: [Doc-SIG] Docststring markup process In-Reply-To: <200103131729.f2DHTdp28610@gradient.cis.upenn.edu> Message-ID: <007f01c0ac70$c9eab250$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > I'd like STminus > to expand to eventually be able to provide a formal definition of > STpy. I'm glad you say that, 'cos I'd like it to be so as well. > But, at any rate, the idea of doing 2 PEPs is to see which one people > like better. I'm not planning on proposing STminus001 as a formatting > convention. I think that's not very reasonable. But Maybe > STpyminus099 (i.e., STminus with py extensions (such as #..# and list > items without blank lines), version 99). I think that it is reasonable to have "sibling" PEPs, which don't compete, if they help elucidate things. And I think we *should* be adopting the STminus work "officially" - i.e., it merits a PEP on those grounds alone. But just my opinion (!!!) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Wed Mar 14 10:28:33 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 14 Mar 2001 10:28:33 -0000 Subject: [Doc-SIG] reserved characters In-Reply-To: <200103131735.f2DHZip29411@gradient.cis.upenn.edu> Message-ID: <008001c0ac71$85352b30$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > The idea would be that specific ST's would be allowed to use > reserved characters if they want, or remove them from the reserved > character list if they want. So in STminus, we would have:: > > ReservedChars = # | @ > > And in STpy, we would have:: > > ReservedChars = # Hmm - I think I'd be happier calling them "special" characters, because they *aren't* "globally" reserved - they can always be used freely in contexts where they aren't keying their special action - for instance:: First we have a #Python literal#, but # this looks like some sort of comment - that third '#' is NOT treated specially. > This production would take care of making sure that those characters > don't appear in normal text. But see above - they (sort of) can. > > I think that for STminus's purposes, it might make sense to make > > characters reserved, *perhaps*, but for the "full fledged" ST's it > > doesn't Ah, well, that *does* make more sense - but I'd still prefer to call them "special" so you can, erm, unreserve them slightly in future versions if you wish. And also so they are clearly related in use to other special characters, like '*'. > > '[' and ']' are "force a reference" characters in Zwiki, and will be > > used for similar purpose in STpy. But again, it depends on context. > > Hm. I'm not sure I like the sound of that. Care to elaborate? It's much the same as the way PEPs work, with '[fred]' keying a reference to an item "labelled" 'fred' (in some way), but being a Wiki page, references off-page are meant to be *very* easy to generate, and the way one does that is by the CapitalLettersInWords convention. However, sometimes that needs circumventing (for instance, 'Tibs' won't trigger such an event, so I would need to type '[Tibs]'). It's OK, honest, it works well. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Wed Mar 14 10:36:07 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 14 Mar 2001 10:36:07 -0000 Subject: [Doc-SIG] quoting In-Reply-To: <200103131737.f2DHbHp29600@gradient.cis.upenn.edu> Message-ID: <008101c0ac72$94322f60$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper asked: > Could someone point me to an explanation of why we *don't* want to use > backslashes for backslashing characters? :) e.g., \* for a literal > asterisk. It seems so much like the natural thing to do, and surely > someone who's coding in python will be familiar with the convention > of backslashing characters.. I'll have a quick try, although it is actually the intersection of several reasons. First, practical. In a docstring, you can't just type '\', you'll have to type at least '\\', and maybe even more. This is a pain. Secondly, in ST *text* (and all of the ST family are intended for writing plain text as well, including STpy), it doesn't read well - it's not something one would naturally type, unlike '...' which *is*, well, quoting. Thirdly, I actually have a gut feeling that using the same escape character for ST text as one uses for strings is going to be awkward anwyay (it *does* tend to lead to the "four backslashes means one" phenomenon, and it's just rather awkward to handle mentally). Fourthly, I think it is instructive to note that there has never been much demand in the STClassic world to solve this, and even STNG isn't worrying about it too hard - that means that most people, most of the time, have managed either not to care or to work around the problem. And fifthly, note that there *is* only one truly awkward case, which is how to quote a single quote. In STpy one can force use of #'# (gosh, I need to add that to my test cases, but it *should* work), or give up and use "'", or just talk around the issue. (a literal asterisk is, of course, just '*') Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Wed Mar 14 10:43:01 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 14 Mar 2001 10:43:01 -0000 Subject: [Doc-SIG] two simple, but important questions In-Reply-To: <01031309583800.00505@lucy> Message-ID: <008201c0ac73$8af67cc0$f05aa8c0@lslp7o.int.lsl.co.uk> Tavis Rudd (who is being very patient with us jumping up and down on his arguments) wrote: > After reading Tibb's and Edward's responses to my questions, > I'm starting to > lean towards option a (keeping everything in the docstring) > for the simple > reason that it's the path of least resistance (/debate). > However, option b > (having the option of a second string literal that follows > the docstring and > is discarded by python at complile-time) does provide the > following things > that option a can't: > - the ability to gracefully document module and class data attributes > (constants etc.) as Marc-Andre Lemburg proposed in PEP 224 Ah - not a problem. In the current world, you just explain them in the class/module docstring (not a bad thing). In MAL's proposal, you add new docstrings AFTER each value's default setting. So old scheme would have:: class Useful: """A really useful class. There are two class variables defined: #name# -- the name of something or other #number# -- its quantity """ # Let's have some default values for those name = "Fred" number = 9 and the new scheme would (I believe) have:: class Useful: """A really useful class. (and I'd maybe *still* have some reference in the class docstring to the class values) """ name = "Fred" """#name# is the name of something or other""" number = 9 """#number# is its quantity""" > - a means for the verbose to document to their heart's > content without > worrying about increasing memory usage and load-time (but is > this ever really > going to be significant??) Let's assume not - people can always strip stuff out with -o if they want. > - a means to include the Library reference documentation in > the same file, as > Ping proposed Which is, let us say, hotly debated. > Of course, if the tools were to give module creators the > option of putting > extended documentation in like this, it doesn't mean that > everyone must do it > this way. If you, as a coder, don't like the idea of > discardible string > literals after the __doc__s, stick with the standard docstring. Ooh, I saw you do that. No, allowing people to do something I don't want them to be able to do, on the grounds they might not do it, isn't what I want (see, at heart I want to be part of the PSU (erm, I didn't say that)). Tibs (who also has odd names) -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Wed Mar 14 10:50:09 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 14 Mar 2001 10:50:09 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Message-ID: <008301c0ac74$89a4da00$f05aa8c0@lslp7o.int.lsl.co.uk> Ken Manheimer wrote: > I may be misguided here, but my impression is that it's not a > goal to have more PEPs. Rather, the idea is to allow for multiple > PEPs that convey competing viewpoints when competing viewpoints > exist. If you diverge just > in marginal particulars, you can present the particulars as > alternatives, oh, I give up - it's going to have to stay formatted funnily. I hate Outlook > with explanation about their relative merits. In general, if it's > suitable for you and tibbs to do a joint PEP, and you'd be > comfy with it, > it sounds like the PEP might be the stronger for it. I agree in principle, but maybe not in particular. The STpy PEP is going to have to be rather long if it is going to give even an informal account of the rules. Not to mention trying to forestall *some* of the inevitable arguments about why those rules won't work (since it isn't always obvious how sub-tle (pronounced "sub","tl") ST actually is in its workings. And I think STminus is a different sort of thing (or two things) - it's an attempt to codify a true formal definition of an ST variant (a first), it's an attempt to produce a common subset (which is a *very* useful idea, both for pedagogy and also for interoperability), and (ok, three) it's an attempt to produce an alternative implementation (but this is the least of the three, so far as I'm concerned - it just happens to be necessary to do the work of the first two). So I think STminus deserves a PEP *on those terms* - assuming PEPs are also meant to be placeholders for "something interesting is happening which could have large effects on the future". Whereas the STpy PEP is just a traditional "I intend to ask people to work like this, and here is an example implementation that does it". Does that make sense? Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Wed Mar 14 10:57:11 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 14 Mar 2001 10:57:11 -0000 Subject: [Doc-SIG] Evolution of library documentation In-Reply-To: <200103132307.f2DN7kp12393@gradient.cis.upenn.edu> Message-ID: <008401c0ac75$858cc440$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > > (the whole issue of how one > > quotes things in ST is a difficult one, single quote itself > is enough of > > a problem. The "true ST" approach, I think, is to say > "borderline case - > > punt it - we're not that complex", which *may* well be the correct > > answer). > > I don't see why it has to be a difficult issue, but hopefully someone > will explain it to me. :) But I disagree that the "true ST" approach > is to say "borderline case - punt it." I think that's the approach > of most *current implementations* of ST, not of ST itself. If ST is > to become accepted and useful, we *do* need to define > borderline cases, > or at least make them explicitly undefined. It really sucks to try > to use a language/markup that keeps changing under your feet, and is > inconsistant between tools. :) I *think* we're talking at slightly different angles, and actually agreeing. By "punt it" I mean "defer providing *any way at all* of doing this thing (that is, specifically, making a single quote character literal, which *is* the problem), on the grounds that *in practice* it may be that noone actually needs it". And that is a very ST way of doing it. > But I do agree that ST is complex enough. (btw, why do we need > 3 different unordered list bullets? That seemed like enormous > overkill to me, and getting rid of 'o' as a bullet would solve > some problems with foreign languages.. Any chance that we could > just standardize on *either* '*' or '-'? I guess the STNG people > won't like that though...) Two reasons, I expect. First, it is useful for people reading the ST text itself to be able to use multiple bullets:: * first item - second item may be easier to read than:: * first item * second item particularly when lists get complex. This is certainly something that most document presentation tools will do for you. And secondly (less importantly) the bullet style *might* be used as a hint to the renderer/formatter of what the user wants to see. But the first is the "real" reason, I think. (just keep remembering that ST is meant to be read "bare", as well as post-processed and formatted.) > I think of arguments as values you give a function, and parameters > as the slots to receive them. But I mainly chose "parameters" to > be consistant with javadoc etc. Doesn't really matter to me what > we call it. Ah - "Arguments" was suggested some while back, that's all - otherwise I don't much care either. > > > author -- Edward Loper > > > version -- 2.71828 > > > > That's:: > > > > Author: Edward Loper > > Version: 2.71828 > > > > and it doesn't work yet (because ':' doesn't yet start a > > paragraph) - it may or may not work in the alpha release. > > As I argued in a previous email, I really think it should be > description > list items, but I'll wait for you to argue your case.. :) I did a bit elsewhere, but it's mostly in the archives, I'm afraid, and it wasn't actually my case at all. But there was a consensus that this worked well with a colon, and that it made sense as a way of introducing "XML" structures. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Wed Mar 14 11:02:36 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 14 Mar 2001 11:02:36 -0000 Subject: [Doc-SIG] formalizing StructuredText: issues In-Reply-To: <200103132314.f2DNExp13135@gradient.cis.upenn.edu> Message-ID: <008501c0ac76$46fe8e60$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > 2 issues have come up recently in my attempts at formalizing ST: > 1. what is the most elegant way to capture indentation constraints? > EBNF+la can't do it very well. This is important when I try to > write productions for literal blocks (ones preceeded by '::'), > since where they end depends on indentation. Perhaps use rules > like:: > > literal_block = (?P S+) line NL ((?P=indent) > S+ line NL)+ > > ? Erm - that requires thinking too hard for me to answer at the moment, so I'll avoid it. > 2. Is there an elegant way to let people say things like > 'TestSet's for > the plural of 'TestSet' (instead of 'TestSets')? Currently, > the second "'" in 'TestSet's would be ignored, as it > would be with > what's or it's. This isn't a problem if we use '#': #TestSet#s. > But I could see it confusing people.. If I understand you correctly, you want to be able to type:: 'Word'text and have it work as if you had a literal "Word" followed by non-literal text, without any intervening space. So as a general case, the answer is "no" (as you might expect when it is explained like that). Now, if one narrows it down to "non-literal text which is constrained to be a single lower case 's'", then of course it would be possible to do, but I would worry that it is getting a bit overcomplex. On the other hand, it may well be a useful thing to do. I would avoid it for now, and consider it as an enhancement for later (btw, I would also ask the same question on the ZWiki, to see what they say - I'd probably be heavily influenced by their reply). (oh - and STpy shouldn't "ignore" the second apostrophe, it should decide that the whole thing doesn't contain a literal string - the literal string can't continue past the second apostrophe since they're not allowed inside literal strings). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Wed Mar 14 11:11:21 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 14 Mar 2001 11:11:21 -0000 Subject: [Doc-SIG] suggestions for a PEP (and new schedule) In-Reply-To: <200103140706.f2E76vp05314@gradient.cis.upenn.edu> Message-ID: <008601c0ac77$80386880$f05aa8c0@lslp7o.int.lsl.co.uk> There are significant disadvantages to working through one's email without any lookahead (even if it is faster). Edward D. Loper wrote: > For now, Tibs and I will do separate PEPs. I think that > working alone,... I agree with Edward, not for the first time. By the way, flagging future events: 1. I'll be away from the office tomorrow (Thursday), so I'd appreciate it if noone did anything interesting until Friday (!!!) 2. I am aiming to try to finish both my PEP and the alpha release of docutils by the end of next week, or failing that (at the very worst) by the end of the week after. docutils will include: * proper docstrings, with nice doctesting * better support for labelled paragraphs (things like:: Author: Tibs and:: Arguments: #fred# -- an argument * the option of choosing validation or not, and some useful validation if it is chosen * a sensible decision on what to do with badly indented paragraphs (including complaining if validation is on, of course) * support for in-document references (this is new new code, as opposed to amended code) * an example of how the PEP itself can be done as a STpy variant (including the necessary Python code) - this is a good way of showing how to customise docutils, and should require minimal change to the PEP text. * a rewritten STpy.html document (i.e., the STpy spec) to reflect current status and thoughts, with at least the starts of "curvy warning" bits to talk about the nooks and crannies of STpy There *may* be a 0.0.5 release before its all finished - depends how long it takes and how much sleep I can do without. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Wed Mar 14 11:14:38 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 14 Mar 2001 11:14:38 -0000 Subject: [Doc-SIG] Test set! In-Reply-To: <200103140533.f2E5Xcp24096@gradient.cis.upenn.edu> Message-ID: <008701c0ac77$f5987930$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > I've finished the first version of my StructuredText test > suite module. Grand stuff - I shan't have time to look at it until later (busy preparing for tomorrow), but "yeh!!". And I wouldn't worry about too many test cases - my experience is also that one needs lots of little tests with minor variations to catch the niggles. Tibs (and now to work) -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From gherman@darwin.in-berlin.de Wed Mar 14 13:04:47 2001 From: gherman@darwin.in-berlin.de (Dinu Gherman) Date: Wed, 14 Mar 2001 14:04:47 +0100 Subject: [Doc-SIG] Re: Evolution of library documentation References: Message-ID: <3AAF6C6F.5FA1B2B2@darwin.in-berlin.de> Ka-Ping Yee wrote: > > At the Python conference, a small group of us discussed the possibility > of merging the external and internal documentation; that is, moving > the library reference into the module source files. It would no longer > be written in TeX so that you wouldn't have to have TeX in order to > produce documentation. This would address the duplication problem and > also keep all of a module's documentation in one place together with > the module. To avoid forcing you to page through a huge docstring > before getting to the source code, we would allow a long docstring to > go at the end of the file (or maybe collect docstrings from anywhere > in the file). > > To implement this convention, we wouldn't need to change the core > because the compiler already throws out string constants if they aren't > used for anything. So a big docstring at the end of the file would not > appear in the .pyc or occupy any memory on import; it would only be > obtainable from the parse tree, and tools like pydoc could use the > compiler module to do that. I know I'm a bit late to jump in on this topic (guess a few days delay can be considered late in a mailing list thread), but nevertheless I would like to make one point that I feel has not been adequately addressed yet. Following Ping's thoughts, quickly as they move, he is pro- posing nothing else, but an equivalent of Don Knuth's well known literate programming scheme in Python. Ping, am I right? I believe the literate programming folks who followed their master in the syntactical challenge of writing code using what was called the Web system (a combination of TeX with other languages like C and Pascal) is rather low, precisely because the syntax to mangle both was maybe ok for Knuth but far from easy for most others. Obviously, Python has something of a promise here... ... but apart from keeping the syntax of docstrings easy to understand there is one issue, that Web solved that Python doesn't (this is where I have to disagree with Ping), at least not right out of the box. While it is possible today to write docstrings like this and also execute the code below as expected: def step1(): print '1!' def step2a(): print '2a!' def step2b(): print '2b!' def foo(bar): 'step 1' step1() 'step 2' if bar > 0: 'step 2a' step2a() else: 'step 2b' step2b() >>> foo(9) 1! 2a! I think without some excellent code parsing/analysing support it will be quite some challenge to implement something to get hold of all these 'additional' docstrings in order to finally get at something close to the books (or some equivalent hyper- text system) rendered with Web/TeX in former times (don't know of any very recent one) or with Mathematica, the most recent one to appear shortly: http://www.wolfram-science.com Fortunately, at ICP9 there were tools announced to analyse Py- thon code much better/easier than ever before, like the comiler module (I think Jeremy gave that presentation). And I'm really putting some hope into that. But finally, we'll probably need to know how far we'd like to go the way of Web/Mathematica? Ping, any ideas? Regards, Dinu -- Dinu C. Gherman ReportLab Consultant - http://www.reportlab.com ................................................................ "The only possible values [for quality] are 'excellent' and 'in- sanely excellent', depending on whether lives are at stake or not. Otherwise you don't enjoy your work, you don't work well, and the project goes down the drain." (Kent Beck, "Extreme Programming Explained") From edloper@gradient.cis.upenn.edu Wed Mar 14 15:07:00 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 14 Mar 2001 10:07:00 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Wed, 14 Mar 2001 10:21:12 GMT." <007e01c0ac70$7e66e100$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103141507.f2EF70p27334@gradient.cis.upenn.edu> > > I think that a bunch of programmers would be willing to do global > > markup (special places to describe arguments, etc). I don't think > > they would do formal coloring markup.. > > No, no - I meant in *new* code, rather than in old. I want documentation > in new code as well! Um. So did I. I'm not sure where you got the impression that I didn't. Hopefully having special places to describe arguments is useful in new code? I wasn't even thinking about the std library.. > Aagh! Quick, do a "from __future__ import docutils" 'cos that's all > being addressed in the alpha release and *it's already coded*. So, my main objection to this format (given that I'm the one that has to formalize it) is that it means we're giving meaning to yet another character: ':'. I read through the list archives, but didn't seem to find eventual consensus. But if we're going to have it, here's how I would *like* it to be defined: 1. *Any* paragraph starting with '\w+:' is a "label paragraph." 2. Any ':' in any other location is treated as a normal ':' 3. It is illegal for a label paragraph to contain a newline. 4. If a label paragraph has (non-space) content after the ':', then that is parsed as the only subparagraph of the label paragraph. 5. If a label paragraph does not have content after the ':', then following indented paragraphs are subparagraphs. 6. If a label paragraph has content after ':' *and* is followed by indented paragraphs, then... I'm not sure. Could be illegal. Could all that the following paras are children of the para that's on the same line.. Ideas? > > OK. The rules for "paragraph labels" (I changed the term) are fairly > simple. Basically, there are two dictionaries:: > > label_dict = { "Label" : "xml-label" } If we're going to be xml-like in specifying contents, then we might want to consider using actual xml DTD-like strings to specify this (so "Label" : "ANY"). :) > I decided *not* to complicate matters > by allowing:: > > Label: > Only indentation makes this look like a paragraph. > > because there are already enough rules about when paragraphs start. Agreed. Although this is bound to confuse people.. :) But according to my rules, it will produce an error if they have error checking turned on.. > This fits in reasonably well with David Ascher's original ideas > (thrashed out at the end of 1999), and to my eye the use of a colon is > more natural than overloading descriptive lists. Again, I don't like giving even more meaning to ':' when we could just give '--' the same meaning, and say it's a "tool issue" to decide whether a key has special meaning.. But as long as we give ':' special meaning *only* and *always* when it appears directly after the first word in a paragraph, I can accept it. > More explanation can (please) wait until the alpha release, which I > *hope* will be end of next week, with the PEP. Looking forward to it. If my emails take too much time to answer, just punt them for now. :) I've been trying to get some answers out of your source code, but I don't have the time to figure out everything that's going on in there.. > > I think it will be helpful to people if we can get this streight now. > > Yes, but can now be next week, 'cos then I won't have to explain things > *quite* so many times, please? Yes, I was thinking of "now" as "before the PEP is accpeted." :) > > >> dosctrings inherit like any other value, surely? > > Um.. yes, but that's not always what you want. Basically, if > > I define:: > > > > class A: > > def f(x): > > "this if f(x)" > > return 1 > > > > class B(A): > > def f(x): > > return 2 > > > > Then I might want B.f to inherit its docs from A.f. This would be > > especially nice for things like UserList, so my classes will have > > docs for its methods without me having to duplicate explanations. > > Then assign them. __docs__ is a perfectly valid "slot" to assign to (and > is the precursor to the whole idea of function values - see the relevant > PEP). I didn't think this was possible, because the following fails:: B.f.__doc__ = A.f.__doc__ But really you have to do this: B.f.im_func.__doc__ = A.f.__doc__ It might be worth mentioning this in documentation, because I think that the whole idea is not quite intuitive, and it's something that the code writers would have to do.. So we should suggest that they do it when applicable! :) > See, as I said, time machines... Time travel's fun! :) But I'm sure you've been through all this too many times to count. I'm a bit new to all these issues, so if I seem dense sometimes, please forgive.. :) -Edward From edloper@gradient.cis.upenn.edu Wed Mar 14 15:18:47 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 14 Mar 2001 10:18:47 EST Subject: [Doc-SIG] reserved characters In-Reply-To: Your message of "Wed, 14 Mar 2001 10:28:33 GMT." <008001c0ac71$85352b30$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103141518.f2EFIlp28692@gradient.cis.upenn.edu> > Edward D. Loper wrote: > > The idea would be that specific ST's would be allowed to use > > reserved characters if they want, or remove them from the reserved > > character list if they want. So in STminus, we would have:: > > > > ReservedChars = # | @ > > > > And in STpy, we would have:: > > > > ReservedChars = # > > Hmm - I think I'd be happier calling them "special" characters, because > they *aren't* "globally" reserved I don't mind calling them special. > - they can always be used freely in > contexts where they aren't keying their special action - for instance:: > > First we have a #Python literal#, but # this looks like some sort of > comment > > - that third '#' is NOT treated specially. Ack, no! :) This is potentially the type of "non-safe" behavior I want to avoid.. This leads to people saying:: x = y*z in their comments, and then later adding:: I *like* multiplication to the same comment, and getting everything from z...I in bold, and an asterisk after like. I think that the use of mismatched delimiters should be undefined. In particular, it should *not* be defined to produce those delimiters as plain text. So in any parser I would write, it would give an error for the string you gave.. Currently, the *only* character that I think should count "only in special circumstances" is the single quote. I don't see any way around that, because of words like "don't." But in my formalization, I'm planning to make "non-markup" use of (most?) other special characters undefined (except when they appear in 'literals' or #inlines#). (Well, "...":.. is also "special," but I'm strongly considering requiring that all double quotes be matched (except in literals and inlines) > Ah, well, that *does* make more sense - but I'd still prefer to call > them "special" so you can, erm, unreserve them slightly in future > versions if you wish. And also so they are clearly related in use to > other special characters, like '*'. > > > > '[' and ']' are "force a reference" characters in Zwiki, and will be > > > used for similar purpose in STpy. But again, it depends on context. > > > > Hm. I'm not sure I like the sound of that. Care to elaborate? > > It's much the same as the way PEPs work, with '[fred]' keying a > reference to an item "labelled" 'fred' (in some way), but being a Wiki > page, references off-page are meant to be *very* easy to generate, and > the way one does that is by the CapitalLettersInWords convention. > However, sometimes that needs circumventing (for instance, 'Tibs' won't > trigger such an event, so I would need to type '[Tibs]'). It's OK, > honest, it works well. This means that we can no longer say '[a, b, c]' without quotes. What types of things would you refer to in this way? Why can't we just use #fred#, when referring to docstrings of other objects? This is yet another 2 characters that are given meaning, and taken away from the documenter as possible text characters.. I want to make sure that whatever we're getting is worth that. -Edward From edloper@gradient.cis.upenn.edu Wed Mar 14 15:31:35 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 14 Mar 2001 10:31:35 EST Subject: [Doc-SIG] quoting In-Reply-To: Your message of "Wed, 14 Mar 2001 10:36:07 GMT." <008101c0ac72$94322f60$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103141531.f2EFVZp00445@gradient.cis.upenn.edu> > I'll have a quick try, although it is actually the intersection of > several reasons. > > First, practical. In a docstring, you can't just type '\', you'll have > to type at least '\\', and maybe even more. This is a pain. You can always use r""...""" > Secondly, in ST *text* (and all of the ST family are intended for > writing plain text as well, including STpy), it doesn't read well - it's > not something one would naturally type, unlike '...' which *is*, well, > quoting. So basically it seems like the question comes down to "can we get away with quoting." For most things, we can. But if we're proposing a standard markup for documenting python, I think it would be very bad if we designed it such that some programs could not be reasonably documented with it.. You can quote most things with '...'. Disadvantages: 1. everything in '...' is monospace, so there's no way to produce a "normal" '*'. This may not be a problem. 2. '...' must be preceeded and followed by whitespace, so the entire ws-delimited token containing the symbol we want to quote must be quoted. In turn, this means that the entire symbol containing a special character must be rendered in monospace, and can't use any coloring (emph, strong, hrefs). You can't quote "'" with '...'. But you can with #'#. Disadvantages: 1. This seems to get very confusing and non-intuitive, to me anyway. 2. There are strange interactions.. For example, you can't quote the symbol "#'", and a number of others. But if a python program happens to use that, they can't reasonably use STfoo for documenting their code.. Backslashing is more powerful in some sense than quoting.. A single backslash character will let you say anything.. A quoting mechanism will always have cases left out. > Thirdly, I actually have a gut feeling that using the same escape > character for ST text as one uses for strings is going to be awkward > anwyay (it *does* tend to lead to the "four backslashes means one" > phenomenon, and it's just rather awkward to handle mentally). We shouldn't ever get more than 2 backslashes for this problem, and it should just be one if people use r""...""", which they can do unless they're planning on inserting strange characters in their documentation string (presumably a bad idea anyway?). > Fourthly, I think it is instructive to note that there has never been > much demand in the STClassic world to solve this, and even STNG isn't > worrying about it too hard - that means that most people, most of the > time, have managed either not to care or to work around the problem. They're in a very different world than we are. When you try to describe an algorithm that processes text, you sometimes really need to be able to use "'" and '#' and '*', and maybe even "#'". I'm still not convinced either way (that we definitely need backslashing or that we definitely don't). So I'll keep brooding on it.. -Edward From edloper@gradient.cis.upenn.edu Wed Mar 14 15:44:33 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 14 Mar 2001 10:44:33 EST Subject: [Doc-SIG] suggestions for a PEP (and new schedule) In-Reply-To: Your message of "Wed, 14 Mar 2001 11:11:21 GMT." <008601c0ac77$80386880$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103141544.f2EFiXp01992@gradient.cis.upenn.edu> > 1. I'll be away from the office tomorrow (Thursday), so I'd appreciate > it if noone did anything interesting until Friday (!!!) Muahaha. Now I can sneak in my PEP, propose it, and get it accepted all before he gets back!! ;) > * the option of choosing validation or not, and some useful validation > if it is chosen Is this validating "label paragraphs" or checking for "bad things" like un-matched delimiters? Assuming the latter, yay!! :) > * a sensible decision on what to do with badly indented paragraphs > (including complaining if validation is on, of course) Yay! > * support for in-document references (this is new new code, as opposed > to amended code) Still not sure about these, but I'll reserve judgement until I see them and their docs. > * a rewritten STpy.html document (i.e., the STpy spec) to reflect > current status and thoughts, with at least the starts of "curvy warning" > bits to talk about the nooks and crannies of STpy sounds useful. > There *may* be a 0.0.5 release before its all finished - depends how > long it takes and how much sleep I can do without. Bah. Sleep is for the weak! Oh, and maybe for the tired, too. -Edward From tony@lsl.co.uk Wed Mar 14 15:52:46 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 14 Mar 2001 15:52:46 -0000 Subject: [Doc-SIG] suggestions for a PEP (and new schedule) In-Reply-To: <200103141544.f2EFiXp01992@gradient.cis.upenn.edu> Message-ID: <008c01c0ac9e$d06b7f50$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote (well, so did I, in part): > > 1. I'll be away from the office tomorrow (Thursday), so I'd > > appreciate it if noone did anything interesting until Friday (!!!) > > Muahaha. Now I can sneak in my PEP, propose it, and get it accepted > all before he gets back!! ;) Hmm. Now, is that a case of "oh no, I shouldn't have said anything, he's going to wreck eveything" or a case of "damn, he's sussed my secret plan"... > Bah. Sleep is for the weak! Oh, and maybe for the tired, too. I've got two small children. This means that I am/will be sleep deprived for ooh, at least a few years more (it's not that they wake you up, it's just that children are another way of using up time). But it must be fun or we wouldn't be doing it (now, where's that definition of "fun" gone, again?) Tibs (who *programs* in his spare time - but luckily knows other weirdos who don't think that's odd) -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Wed Mar 14 15:53:32 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 14 Mar 2001 15:53:32 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103141507.f2EF70p27334@gradient.cis.upenn.edu> Message-ID: <008d01c0ac9e$ebb14f10$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > > in new code as well! > > Um. So did I. I'm not sure where you got the impression that I > didn't. Hopefully having special places to describe arguments is > useful in new code? I wasn't even thinking about the std library.. Erm - sorry, probably "too early in the morning"-itis > > Aagh! Quick, do a "from __future__ import docutils" 'cos that's all > > being addressed in the alpha release and *it's already coded*. > > So, my main objection to this format (given that I'm the one that > has to formalize it) is that it means we're giving meaning to > yet another character: ':'. I read through the list archives, but > didn't seem to find eventual consensus. But if we're going to have > it, here's how I would *like* it to be defined: > > 1. *Any* paragraph starting with '\w+:' is a "label paragraph." > 2. Any ':' in any other location is treated as a normal ':' > 3. It is illegal for a label paragraph to contain a newline. > 4. If a label paragraph has (non-space) content after the ':', > then that is parsed as the only subparagraph of the label > paragraph. > 5. If a label paragraph does not have content after the ':', > then following indented paragraphs are subparagraphs. > 6. If a label paragraph has content after ':' *and* is followed > by indented paragraphs, then... I'm not sure. Could be illegal. > Could all that the following paras are children of the para > that's on the same line.. Ideas? Off the top of my head, I *think* the rules it works by are remarkably similar. I *don't* think it enforces the "any paragraph with label-chars then a colon must be a label paragraph to be valid" (although in validation mode it maybe should). I'm fairly sure that if it is a one-line paragraph with text after the colon, then it *will* grumble (again, in validation mode) if there are subparagraphs. [beware that in non-validation (passthrough) mode docutils will try to "do something sensible", since it is very frustrating for an end-user to be unable to get documentation out because the documentation writer made mistakes, and because there are applications (like, maybe, Wiki pages) where there is nowhere for the validation messages to go.] STpy, on the other hand, should define non-valid occurrences *as* non-valid (whilst not forbidding the existence of "passthrough" tools). > > OK. The rules for "paragraph labels" (I changed the term) are fairly > > simple. Basically, there are two dictionaries:: > > > > label_dict = { "Label" : "xml-label" } > > If we're going to be xml-like in specifying contents, then we > might want to consider using actual xml DTD-like strings to > specify this (so "Label" : "ANY"). :) I had thought about something like that, and dithered... Basically, allowing absence to mean "anything" means that someone defining a new profile who honestly doesn't care about validation (in this manner) need do nothing, which swayed it for me. > > I decided *not* to complicate matters > > by allowing:: > > > > Label: > > Only indentation makes this look like a paragraph. > > > > because there are already enough rules about when paragraphs start. > > Agreed. Although this is bound to confuse people.. :) But > according to my rules, it will produce an error if they have > error checking turned on.. I can't see a way out of confusing people in some circumstances, unfortunately. It *is* possible that future releases (of STpy and docutils) might relax this, but I'm not convinced it's worth the think-space. > > This fits in reasonably well with David Ascher's original ideas > > (thrashed out at the end of 1999), and to my eye the use of > a colon is > > more natural than overloading descriptive lists. > > Again, I don't like giving even more meaning to ':' when we could > just give '--' the same meaning, and say it's a "tool issue" to > decide whether a key has special meaning.. But as long as we > give ':' special meaning *only* and *always* when it appears directly > after the first word in a paragraph, I can accept it. Whereas I don't like reusing '--' for something that *isn't* a descriptive list (these things fit in a different place in my head, but maybe not in yours). On the other hand, the use of the colon *was* always specified as being so restricted. > > More explanation can (please) wait until the alpha release, which I > > *hope* will be end of next week, with the PEP. > > Looking forward to it. If my emails take too much time to answer, > just punt them for now. :) I've been trying to get some answers > out of your source code, but I don't have the time to figure out > everything that's going on in there.. So far, all the emails I've been answering (no matter how fast and carelessly) have been useful for clarifying stuff in my head. And I've printed out this last one, because I want to check my implementation of the labeled paragraphs against your list. > > > I think it will be helpful to people if we can get this > streight now. > > > > Yes, but can now be next week, 'cos then I won't have to > explain things > > *quite* so many times, please? > > Yes, I was thinking of "now" as "before the PEP is accpeted." :) Oops - sorry. Another artefact of being in a hurry. > I didn't think this was possible, because the following fails:: > > B.f.__doc__ = A.f.__doc__ > > But really you have to do this: > > B.f.im_func.__doc__ = A.f.__doc__ Well, I didn't realise that, but (a) that counts as "works" (for an odd value of "works", admittedly) and (b) it may change with function attributes - don't know. > It might be worth mentioning this in documentation, because I > think that the whole idea is not quite intuitive, and it's something that the > code writers would have to do.. So we should suggest that they do it > when applicable! :) Oh definitely - can I leave you to remember it, though? > Time travel's fun! :) But I'm sure you've been through all > this too many times to count. I'm a bit new to all these issues, > so if I seem dense sometimes, please forgive.. :) Well, it's only twice at most, and new ideas *do* keep coming up all the time, and besides, all the newcomers this time round are *very valuable*. Hmm. The next two emails actually require some coherent thought to answer (stuff about quoting and '#...#' and so on), so I'll defer them for now. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Thu Mar 15 21:37:46 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 15 Mar 2001 16:37:46 EST Subject: [Doc-SIG] formalizing StructuredText Message-ID: <200103152137.f2FLblp17063@gradient.cis.upenn.edu> I've been working on expanding the domain of STminus (a formalized version of StructuredText, expressed in an EBNF variant).. And the following questions came up. (Some of them may not make much sense if you're not familiar with StructuredText.) These are generally not questions that have "correct" answers, so I'm wondering what people think I should make STminus do. (Of course I'm interested in what STpy and STNG have to say about these things too). * Are list items required to have contents? I.e., can a list item be just a bullet? This only makes sense to me if you used it in an environment like:: 1. text... 2. text... * Apostrophes can appear in the middle of a word or at the end of a word, like "isn't" and "dogs'". Is it illegal to have multiple apostrophes in the same word? There are no English words that use multiple apostrophes, but I'm not sure about other languages (although there are probably some languages that have words with apostrophes at the beginning of a word, ("'til"?) and StructuredText clearly won't deal with those..) * When parsing various structures, like paragraphs and list items and bold items, what whitespace is kept? E.g., if I were to export to XML, would the trailing whitespace on paragraphs be included? Or the whitespace between a description list key and the hyphen? * Can #inline# expressions contain newlines? I assume not ('literal' expressions can't.) * What are valid expressions for starting an ordered list item? Currently STNG uses "([a-zA-Z]+\.)|([0-9]+\.)|([0-9]+\s+)" i.e., a series of letters followed by a dot, a series of numbers followed by a dot, or a number followed by space. This seems wrong to me, because it implies that the following are ordered list items:: Hi. This is a list item. 12 is a fun number. And it does not allow for expressions like: 1.2. This is a list item. Also, note that since in STpy variants (which will include my proposed markup for formatted docstrings), list items can begin without an intervening space.. So we would get:: The first line is a paragraph but the second line is a list item. (Since it starts with letters followed by a dot) Even if we restrict ourselves to Roman numerals, we have problems:: Hopefully someone who can figure this out who is smarter than I. But I don't see a way to use roman numerals safely.. So maybe we could just use "([0-9]+\.)+"? * What restrictions are there on hfrefs ("name"://http:some.url) According to STNG, they can use relative URLs ("name":whatever). These end up being pretty tricky to formalize.. * Can href names span multiple lines? * Can href names contain coloring? (I'd like to say no) * Should the string '":' only be allowed for hrefs? Or maybe '":(?!\s)', so you can say "this": that? * What do you do with things like:: This *is "too* confusing":http://some.url (Keeping in mind that things like this should be ok):: Normally *quotes " don't have* any special meaning," so they don't have to nest properly.. Well, that's all for now. I'll post more issues as they come up. :) -Edward From Edward Welbourne Thu Mar 15 18:54:14 2001 From: Edward Welbourne (Edward Welbourne) Date: Thu, 15 Mar 2001 18:54:14 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <007101c0abb0$99519bb0$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <007101c0abb0$99519bb0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: > (We do realise, don't we, that when we put a PEP on the main > python-list, all the arguments from the last 5 years are going to be > endlessly rehashed? Yes, I thought we did.) yes, but doc-sig regulars have had so much practice at these arguments (at most a year between iterations) you should be able to moderate the general list-population's discussion quite tightly ;^> Eddy. From Edward Welbourne Thu Mar 15 19:48:36 2001 From: Edward Welbourne (Edward Welbourne) Date: Thu, 15 Mar 2001 19:48:36 +0000 (GMT) Subject: [Doc-SIG] reserved characters In-Reply-To: <200103141518.f2EFIlp28692@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103141518.f2EFIlp28692@gradient.cis.upenn.edu> Message-ID: >> - that third '#' is NOT treated specially. > > Ack, no! :) This is potentially the type of "non-safe" behavior > I want to avoid.. This leads to people saying:: > > x = y*z > > in their comments, and then later adding:: > > I *like* multiplication well, the first one `should' be inside some sort of mark-up which will tell your parser that it's a chunk of python code; which suppresses the magic meaning of * so this shouldn't present a problem. Likewise, > This means that we can no longer say '[a, b, c]' without quotes. well, hey, a doc-string is text and an interlude of python code (that's what a [list, of, items] is, right ?) should be quoted (well, marked up as being a chunk of python code). [Then again, I tend to put big parenthetical texts into square brackets rather than curved ones, but I'm guessing Tony's proposed usage as link-generators will only apply when there's only one word in the interval ?] So do we really want to be able to say it without (some sort of) quoting ? Eddy. -- Language is a set of conventions that evolve by anarchy. True lovers of language respect both its conventions and the anarchy from which those conventions emerge. -- Tom Tadfor Little From Edward Welbourne Thu Mar 15 19:39:16 2001 From: Edward Welbourne (Edward Welbourne) Date: Thu, 15 Mar 2001 19:39:16 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103141507.f2EF70p27334@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103141507.f2EF70p27334@gradient.cis.upenn.edu> Message-ID: Tony and Edward said: >> Then assign them. __docs__ is a perfectly valid "slot" to assign to (and >> is the precursor to the whole idea of function values - see the relevant >> PEP). > > I didn't think this was possible, because the following fails:: > > B.f.__doc__ = A.f.__doc__ > > But really you have to do this: > > B.f.im_func.__doc__ = A.f.__doc__ erm ... Python 1.5.2 (#5, Oct 4 1999, 13:36:16) [GCC 2.7.2.3] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def hello(there): return there ... >>> hello.__doc__ = 'hello there' >>> hello.__doc__ 'hello there' if it fails in 2.*, eek ! (Sorry if this has already been said in one of the other thirty-something e-mails I've yet to reach in my in-box, but I'll have forgotten it by the time I get through them.) Eddy. -- "There arises from a bad and unapt formation of words a wonderful obstruction to the mind." - Francis Bacon From Edward Welbourne Thu Mar 15 20:33:00 2001 From: Edward Welbourne (Edward Welbourne) Date: Thu, 15 Mar 2001 20:33:00 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103130149.f2D1n8p01283@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103130149.f2D1n8p01283@gradient.cis.upenn.edu> Message-ID: Edward said: > Among other (very good) reasons, this would mean that only the writer(s) > of a module can write tutorials, discussions of usage, etc. for it.. erm ... are you suggesting this is how it should be or ... well, anyway, I definitely think someone other than the module author *should* be able to supply the tutorial, and preferably *without* having to edit the module's *source*. I am not as good at *explaining* things as I am at *designing* and *implementing* them; and when I'm *maintaining* some code I really don't want the tutorial getting in my way. Granted having the tutorial in a separate file from the source does imply (so I'm now at odds with my earlier message - Tony's messages since have been reminding me of things I've forgotten since last time doc-sig visited these topics) a danger of getting out of sync ... but having a huge tutorial doc in the file just means I^H the maintainer gets into the habit of paging through large swathes of documentation without looking at them, hence maintains the tutorial only as much as if it *had* been in a different file, while risking missing things which *look like* tutorial (etc.) docs but are actually pertinent to the implementation. The source code should contain documentation addressing the following needs: * the maintainer needs to know what the code does and why * someone manipulating an object (or function) supplied by the code needs to be able to interrogate it (e.g. at the interpreter's prompt, but equally possibly in code) to find out what it is and can do - and how to get it to do what it can do - without needing to know where it comes from. [I'll call this `someone' the interrogator for distinction from the maintainer.] I am tempted to be wilfully contentious and add: and nothing else; but, in any case, these two are the priorities for doc strings (particularly the latter - hints to the maintainer can go in comments). All other docs belong some place else, to which the module docstring should refer via the xref mechanisms ST* provide. Neither of these forms of documentation needs anything like the sophistication of markup that *is* genuinely needed by tutorials, reference manuals, etc. and maintaining in-code docs *will not happen* if it has more sophistication to its markup than the bare minimum that suffices to deliver the above goals. So the docs in other files must be in a richer markup language than the docstrings can afford to call for; yet another reason to put them in a separate file. Granted I'd sooner the `other doc' language was a richer ST, rather than a totally different language (much as I like TeX) from the one used in docstrings. But let the module *source* be as small as is compatible with the needs of maintainer and interrogator. Sometimes a very powerful tool, requiring minimal explanation for the maintainer and not much for the interrogator, needs screeds and screeds of tutorial, reference, etc. documentation - typically to lead the user into only using it in the ways that are actually safe (powerful tools being somewhat prone to needing such care). > Possibly in a different file.. I find Tibb's arguments pretty convincing.. Tibs is like that. Get used to it and *use* him ;^> >> - documenting Packages > There definitely need to be provisions for that. erm ... we already have __init__.py, so a __doc__.py might be cool ? Possibly alongside __doc__.stng or some such containing the ref docs ? >> - documenting extensions in other languages > Much easier if we can import modules. But I guess safety's important. > Oh well. erm ... if someone's idea of having .pyd files goes through, mayhap extension.so could come with extension.pyd providing the matching docs, as if it had been compiled out of a notional extension.py Importing modules is entitled to have side-effects such as changing the way other modules work; and these changes are apt to be designed on the premis that the module is being imported for the sake of *using* it, not introspection. So at least *some* of the doc-tools must be prepared to do their own parsing; albeit the interrogator's perspective takes for granted that the module *is* imported. Eddy. -- I believe in getting into hot water; it keeps you clean. -- G. K. Chesterton. From Edward Welbourne Thu Mar 15 22:51:40 2001 From: Edward Welbourne (Edward Welbourne) Date: Thu, 15 Mar 2001 22:51:40 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103131717.f2DHHIp27124@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103131717.f2DHHIp27124@gradient.cis.upenn.edu> Message-ID: Edward said: > I agree with the idea, but I strongly think that things like author > should be description list items. ... > Unless you can come up with a compelling reason for using:: > > Author: Guido > > instead of:: > > Author -- Guido > > I think we should go with the latter. Hmm. There are more things in heaven and earth than are dreamed of in your philosophy, Horatio. Dunno if this compels but here I go. Consider:: Author -- Guido Version -- 3.14 A random paragraph at the same indentation level, so it *isn't* a sub-paragraph of the Version list item; but it equally isn't a list item in the implied descriptive list This is gibberish. However, it appears to be called for in Edward's scheme of things, in the places where Tibs' calls for the same with the ` --' replaced by a `:'. The Tibs-form would be:: Author: Guido Version: 3.14 A random paragraph as before. which works out less like gibberish when my brain's parser comes to try to work out what it means - substantially because I'm reading a *python* program, in which : has the kind of meaning that this usage is asking it to have. I would be more in sympathy with Edward's account of the matter if it proposed:: Document information Author -- Guido Version -- 3.14 A random paragraph which isn't a sub-paragraph of the document information. While I'm at it: I loathe and despise the apparent demand (seemingly from both schools - Tibs: does your form require the space between the Author and Version lines ?) for blank lines in various places which feel (to me) just plain wrong; I want to write (descriptive lists and, in particular, ...) the Edward-form of the last as Document information Author -- Guido Version -- 3.14 A random paragraph which isn't a sub-paragraph of the document information. and believe I shall not be alone (among python programmers) in tending to neglect those blank lines (in *all* descriptive lists) and being irritated by any tool which insists on me adding them. Gratuitous whitespace `halves' the amount of information I can fit in front of my eyeballs at a given moment and that *matters* to the *maintainability* of my code - an issue which I take very seriously. Given the choice between pandering to the tastes of a tool for extracting the documents and being kind to the poor sod who has to maintain the code I am typically going to side with the latter - substantially because I'm likely to be the maintainer, and I'm no more likely to remember what I was thinking when I wrote the code than is anyone else, even though they didn't write it and I did - I don't use my wetware for *memory*, that's what silicon and ferrite are for. Eddy. From Edward Welbourne Thu Mar 15 23:41:45 2001 From: Edward Welbourne (Edward Welbourne) Date: Thu, 15 Mar 2001 23:41:45 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <008d01c0ac9e$ebb14f10$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <008d01c0ac9e$ebb14f10$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: Edward Loper and Tibs were discussing having B.f borrow A.f's __doc__ when B inherits from A, Tony said assign ... >> But really you have to do this: >> >> B.f.im_func.__doc__ = A.f.__doc__ > Well, I didn't realise that, but (a) that counts as "works" (for an > odd value of "works", admittedly) and (b) it may change with function > attributes - don't know. ah. Just tested what I should have earlier - Edward's right: >>> def hello(there): return there ... >>> hello.__doc__ = 'hello there' >>> hello.__doc__ 'hello there' >>> class A: ... def hello(there): return there ... >>> A.hello.__doc__ = 'hello there' Traceback (innermost last): File "", line 1, in ? TypeError: attribute-less object (assign or del) >>> A.hello.im_func.__doc__ = 'hello there' >>> A.hello.__doc__ 'hello there' It works for functions but not if they're hanging off the namespace of a class ? The attribute shows up on the class method after I set it on the class method's im_func ? This is wrong, obscene and ugly ! Naughty Guido - explain ! But, like I say elsewhere, we shouldn't *need* to be assigning, and: this smells like we should be using acquisition. Eddy. From Edward Welbourne Thu Mar 15 22:58:57 2001 From: Edward Welbourne (Edward Welbourne) Date: Thu, 15 Mar 2001 22:58:57 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103131717.f2DHHIp27124@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103131717.f2DHHIp27124@gradient.cis.upenn.edu> Message-ID: Tony then Edward said: >> dosctrings inherit like any other value, surely? > Um.. yes, but that's not always what you want. Basically, if I define:: > class A: > def f(x): > "this if f(x)" > return 1 > > class B(A): > def f(x): > return 2 > > Then I might want B.f to inherit its docs from A.f. Albeit (at least at 1.5.2, as Tony has pointed out) I can set B.f to A.f by assignment, I agree with Edward, sort of, that this is what `should' happen. But ... This sounds more like *acquisition* than inheritance, so I really want to drag Jim Fulton into its discussion. Sadly I haven't had time to play with Zope and pursue acquisition algebra yet. (Bloody capitalism - keeps me too busy to have fun ;^) Hi Jim, Doc-Sig calling ... Eddy. -- If at first you don't succeed, try doing it the way you were told. From edloper@gradient.cis.upenn.edu Fri Mar 16 00:29:01 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 15 Mar 2001 19:29:01 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Thu, 15 Mar 2001 19:39:16 GMT." Message-ID: <200103160029.f2G0T1p02926@gradient.cis.upenn.edu> > > I didn't think this was possible, because the following fails:: > > > > B.f.__doc__ = A.f.__doc__ > > > > But really you have to do this: > > > > B.f.im_func.__doc__ = A.f.__doc__ > > erm ... ... > 'hello there' > > if it fails in 2.*, eek ! The problem is that A.f is a method, not a function:: >>> type(A.f) >>> type(A.f.im_func) >>> And when you read A.f.__doc__, some "magic" returns A.f.im_func.__doc__. But there isn't really an A.f.__doc__. But as long as you change A.f.im_func.__doc__, the changes will be visible from A.f.__doc__:: >>> A.f.im_func.__doc__ = "new doc" >>> A.f.__doc__ 'new doc' So.. it may make sense to somehow change the magic that associates a method with it's functions documentation.. But it's not a serious problem, because you *can* set the docs of a module. I don't really know what "acquisition" is, but one problem with making this an automatic process is that sometimes it's *not* what you want. I guess the question is whether it's what you want more often or not what you want more often. If it's usually what you want, you can disable it with:: class B(A): def f(x): "" # don't inherit docs return x+1 If it's usually *not* what you want, or if we want to keep things simpler, the following seems to work (I don't know why you don't need to use .im_func here):: class B(A): def f(x): return x+1 B.f.__doc__ = A.f.__doc__ I'd be happy with either. -Edward From edloper@gradient.cis.upenn.edu Fri Mar 16 00:35:14 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 15 Mar 2001 19:35:14 EST Subject: [Doc-SIG] reserved characters In-Reply-To: Your message of "Thu, 15 Mar 2001 19:48:36 GMT." Message-ID: <200103160035.f2G0ZFp03453@gradient.cis.upenn.edu> > > x = y*z > > in their comments, and then later adding:: > > I *like* multiplication > > well, the first one `should' be inside some sort of mark-up which will > tell your parser that it's a chunk of python code; which suppresses the > magic meaning of * so this shouldn't present a problem. I was mainly saying that we should say something like "'*' can only be used as a delimiter for emph/strong, as a bullet, or within a 'literal' (or #inline#) region (or a blockquote, I guess). Also, all delimiters must be matched. So the following string is not a legal ST string:: x*y > well, hey, a doc-string is text and an interlude of python code (that's > what a [list, of, items] is, right ?) should be quoted (well, marked up > as being a chunk of python code). Yes. But if we let them "get away" with not quoting it in special circumstances (like the string "x*y"), then people will end up getting confused.. Not also that '[a, b, c]' is not always python code. It's also mathematical notation.. Same with 'x*y'. But I don't see a problem with requiring that mathematical notation be put in quotes (although that means people can't use symbols like #x'# in their mathematical notation.. but I can live with that) -Edward From edloper@gradient.cis.upenn.edu Fri Mar 16 00:47:06 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 15 Mar 2001 19:47:06 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Thu, 15 Mar 2001 20:33:00 GMT." Message-ID: <200103160047.f2G0l6p04520@gradient.cis.upenn.edu> > Edward said: > > Among other (very good) reasons, this would mean that only the writer(s) > > of a module can write tutorials, discussions of usage, etc. for it.. > > erm ... are you suggesting this is how it should be or ... Sorry, that was unclear. My stance is: Inline documentation should include clear, concise definitions of what guarantees are provided by the Python object they document. I will call that "API documentation." Inline documentation should *not* include tutorials, howtos, extended explanations, unreasonable amounts of background information, etc. That way, the module author writes the "definitive definition" of the module's behavior. But everyone else is free to go write tutorials, etc. > The source code should contain documentation addressing the following > needs: > > * the maintainer needs to know what the code does and why s/the maintainer/anyone reading the source/ Note that what the code *actually* does and what it *promises* to do are very different things (but hopefully the former is a superset of the latter). For example, code may *promise* to return a list of names, but the order in which it *actually* returns it is not specified.. So the API docs let you distinguish what is part of the design from what is an implementation choice.. > * someone manipulating an object (or function) supplied by the code > needs to be able to interrogate it (e.g. at the interpreter's > prompt, but equally possibly in code) to find out what it is and can > do - and how to get it to do what it can do - without needing to > know where it comes from. [I'll call this `someone' the > interrogator for distinction from the maintainer.] I think this should generally be the same as what I called the API docs.. > Neither of these forms of documentation needs anything like the > sophistication of markup that *is* genuinely needed by tutorials, > reference manuals, etc. I'm not sure reference manuals always need more markup. For example, the reference manual for the entire Java library basically uses almost no markup, and is very readable/useful. But I can see how reference manuals can benefit from extra markup (esp. if you want to write real math equations). > Granted I'd sooner the `other doc' language was a richer ST, rather than > a totally different language (much as I like TeX) from the one used in > docstrings. I think this will have to be a discussion and/or project for another day, though. :) -Edward From edloper@gradient.cis.upenn.edu Fri Mar 16 00:59:40 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 15 Mar 2001 19:59:40 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Thu, 15 Mar 2001 22:51:40 GMT." Message-ID: <200103160059.f2G0xep05577@gradient.cis.upenn.edu> > Consider:: > > Author -- Guido > > Version -- 3.14 > > A random paragraph at the same indentation level, so it *isn't* a > sub-paragraph of the Version list item; but it equally isn't a list > item in the implied descriptive list I would interpret that as: AuthorGuido Version3.14 ... But I can see how it might be confusing. > which works out less like gibberish when my brain's parser comes to try > to work out what it means Which is what is really important; if others agree, then I'll be convinced. > substantially because I'm reading a *python* > program, in which : has the kind of meaning that this usage is asking it > to have. But we have to be careful here, since the analogy doesn't completely carry over. Notably, this isn't what most Python programmers would think it is:: Author: Person1 Person2 Person3 The 3rd person is not part of the author list; it's some sort of subparagraph of it, which doesn't make much sense to me.. > While I'm at it: I loathe and despise the apparent demand (seemingly > from both schools - Tibs: does your form require the space between the > Author and Version lines ?) for blank lines in various places which feel > (to me) just plain wrong; I want to write (descriptive lists and, in > particular, ...) the Edward-form of the last as Neither STpy (Tibs' version) nor whatever version of STminus I end up proposing for docstrings will require blank lines before list items. STNG and "vanilla" variants of STminus will be the only ones that require blank lines before list items. Note that blank lines *are* currently needed in all other places, though.. so, for example, if you want to use headings, you must say:: Heading Text... And not:: Heading Text... > Given the choice between > pandering to the tastes of a tool for extracting the documents and > being kind to the poor sod who has to maintain the code > I am typically going to side with the latter I definitely agree. But of course, there's always something to be said for simplicity too.. If someone writing documentation has to remember too many rules, they may get confused.. -Edward From edloper@gradient.cis.upenn.edu Fri Mar 16 01:01:27 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 15 Mar 2001 20:01:27 EST Subject: [Doc-SIG] quoting In-Reply-To: Your message of "Thu, 15 Mar 2001 23:03:26 GMT." Message-ID: <200103160101.f2G11Rp05742@gradient.cis.upenn.edu> > > Could someone point me to an explanation of why we *don't* want to use > > backslashes for backslashing characters? :) e.g., \* for a literal > > because we're working in a context within which \ is already being > processed, so we'll end up having to say \\ when we mean \ and ... it'd > just get ugly. You can always use r""...""" It just seems like there should be *some* mechanism for backquoting, since there are some things that you can *not* express without it.. Would people still object if we made some mechanism available, but *strongly* discouraged people from using it when they could avoid it (since it makes the plaintext hard to read)? -Edward From Edward Welbourne Fri Mar 16 00:14:03 2001 From: Edward Welbourne (Edward Welbourne) Date: Fri, 16 Mar 2001 00:14:03 +0000 (GMT) Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103152137.f2FLblp17063@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103152137.f2FLblp17063@gradient.cis.upenn.edu> Message-ID: > * Apostrophes can appear in the middle of a word or at the end > of a word, like "isn't" and "dogs'". Is it illegal to have > multiple apostrophes in the same word? There are no English > words that use multiple apostrophes, but I'm not sure about oh no. Excuse me while I switch into Imagine someone's written a document, of which a prominent chunk is a sequence of declarations all sharing a pivotal use of isn't (The King isn't any better at running the country than we are; his chancellor isn't any more financially adept than our merchants; ...). Someone else ends up writing about this document. They're mainly discussing the declarations. So they end up using isn'ts to refer to the uses of isn't and obviously their possessive is isn'ts' Now, when you've persuaded yourself I'm crazy, go check a database of Anglic usage and discover how much more perverted the actual counter-examples are. And please don't inflict them on me, coming up with that one hurt. But I think we can safely say that authors of docstrings will be prepared to retract anything that perverse, once the tools complain. You should apply the same reasoning to some of your other worries. (well bugger me - my #inline(code)# proposal got implemented !) > i.e., a series of letters followed by a dot, a series of > numbers followed by a dot, or a number followed by space. ... a few counter-cases ... and what about 2a. subsidiary cases not to mention 3: some of us like colons but, to be quite frank, ([0-9]+\.)+ sounds fine to me. And don't bother refining that to ([1-9][0-9]*\.)+ 'cos some of us *do* count from zero, OK ? > * What do you do with things like:: > > This *is "too* confusing":http://some.url Find author, apply pain (to taste). Give them the opportunity to retract. If they refuse, apply lethal doses of pain. Then they won't repeat the offence. No problem. Eddy. -- Those wishing to be literal-minded about applying pain to taste may feel free to deploy hot chilli sauce. In the eyes. From edloper@gradient.cis.upenn.edu Fri Mar 16 03:12:35 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 15 Mar 2001 22:12:35 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Fri, 16 Mar 2001 00:14:03 GMT." Message-ID: <200103160312.f2G3Cap18781@gradient.cis.upenn.edu> > > * Apostrophes can appear in the middle of a word or at the end > > of a word, like "isn't" and "dogs'". Is it illegal to have > > multiple apostrophes in the same word? There are no English > > words that use multiple apostrophes, but I'm not sure about > > oh no. Excuse me while I switch into (perverted counterexamples...) > But I think we can safely say that authors of docstrings will be > prepared to retract anything that perverse, once the tools complain. > You should apply the same reasoning to some of your other worries. I find myself going back and forth, because it's really not hard (from a tools perspective) to allow words with multiple apostrophes.. The main disadvantage I can see is that people might think that in:: the'big'dog "big" will be rendered as literal... > (well bugger me - my #inline(code)# proposal got implemented !) That was your idea? One day, we should trace back all of these ideas to their originators, and maybe give them credit or something. :) > ... a few counter-cases ... > and what about > > 2a. subsidiary cases > > not to mention > > 3: some of us like colons > > but, to be quite frank, ([0-9]+\.)+ sounds fine to me. It's hard to come up with a rule that's both simple and safe, but covers cases like '2a.' and '3:'. So, unless Tibs or others strongly object, I think we should just stick with '([0-9]+\.)+'. :) > > * What do you do with things like:: > > > > This *is "too* confusing":http://some.url > > Find author, apply pain (to taste). > Give them the opportunity to retract. > If they refuse, apply lethal doses of pain. > Then they won't repeat the offence. > No problem. Perhaps I should rephrase that. What should a *parser* do? I guess "die" is a good answer, though it sounds like you might prefer something along the lines of "erase their hard drive." :) -Edward From edloper@gradient.cis.upenn.edu Fri Mar 16 04:32:10 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 15 Mar 2001 23:32:10 EST Subject: [Doc-SIG] What counts as a url? Message-ID: <200103160432.f2G4WAp25746@gradient.cis.upenn.edu> So I'm working on adding HREFs to STminus. They look like this:: "anchor name":URL Where URL is either a relative URL or an absolute URL.. So I went and looked up "RFC 2396":http://www.w3.org/Addressing/rfc2396.txt . It suggests (if I'm reading it correctly) that we could define a URL as:: ([a-zA-Z0-9-_.!~*'();/?:@&=+$,#] | %[0-9a-fA-F][0-9a-fA-F])+ Should we use that regexp for URLs? Or perhaps we should go for simplicitly, and say that the regexp ends at whitespace:: [^\s]+ In either case, we'll have to be careful to say:: See "this":http://url . instead of:: See "this":http://url. (the '.' gets included in the second url). Is that a problem? If so, what can we do about it? (Keep in mind that it *is* acceptable to have a URL that ends in a '.').. Of course, I don't think people will be including HREFs in their documentation much, anyway.. So the main issue for most people will just be that they can't use '":' in certain environments.. Ideas/thoughts? -Edward From tony@lsl.co.uk Fri Mar 16 10:25:34 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 16 Mar 2001 10:25:34 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Message-ID: <00a901c0ae03$6f89ca50$f05aa8c0@lslp7o.int.lsl.co.uk> Hmm - Eddy's back in the fray (with the combination of Eddy and Edward - heh, that actually works!) I think things are sure to be, erm, interesting (challenging? certainly needful of careful thought before ponficating - oh, well, it wasn't going to last - I'm *good* at pontificating (no, I didn't mean that it's *well received*, I just meant it's a mode I'm good at dropping into)). Eddy (i.e., Edward Welbourne, fuller-named) said: > yes, but doc-sig regulars have had so much practice at these > argument (at most a year between iterations) you should be > able to moderate the general list-population's discussion > quite tightly ;^> The problem is (licks finger and holds it up in the air - yes) threefold: 1. Rehashing arguments you've had before several times is boring 2. BUT it IS still worth rehashing them, because sometimes one changes one's mind (been there, done that) 3. unfortunately, it *does* take up a lot of time and typing (and it isn't *really* fair to just say "we discussed that, it's in the archive, tough" - although it's *tempting*!) > On assigning docstrings: > erm ... > > Python 1.5.2 (#5, Oct 4 1999, 13:36:16) [GCC 2.7.2.3] on linux2 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> def hello(there): return there > ... > >>> hello.__doc__ = 'hello there' > >>> hello.__doc__ > 'hello there' > > if it fails in 2.*, eek ! Hmm - trying (also in 1.5.2): >>> class fred: ... def f1(): ... """f1""" ... >>> class jim: ... def j1(): ... pass ... j1.__doc__ = fred.f1.__doc__ ... >>> jim.j1.__doc__ 'f1' Well, that's what I was thinking of - I can't imagine it's broken in 2.n On the other hand, I can't help feeling that what I'd *really* do is give #j1()# a docstring that just said:: """This should behave identically to #fred.f1()#, which see.""" ('cos that's *documentation* of what I mean). Still, whilst I might not want to do it, Python is clearly meant to *allow* what Edward (L) wants to do, so I shan't try to make it harder... acquisition - hmm > [Then again, I tend to put big > parenthetical texts into square brackets rather than curved ones, but > I'm guessing Tony's proposed usage as link-generators will only apply > when there's only one word in the interval ?] Not even one word - it will have to be an identifier (in the XML sense, a name - which is, I think, much the same as a Python identifier). (btw, Edward (L), I *think* it should be possible to make dodgy and erroneous constructions emerge in the DOM tree marked up as such - that is, for instance, "badpara" instead of "para" or "dlist" or whatever - or maybe as an attribute - so that a renderer can highlight things appropriately - I've some notes I want to compare with the source code this weekend to see how feasible it is.) I would certainly expect to do #[1,2,3]# in STpy docstrings, rather than [1,2,3] (in STClassic, I would have expected to do '[1,2,3]', so that's not much different). Eddy then got into "agreement with me" mode, which is always a bit worrying - one waits for the other boot to drop (or whatever the saying is) > > Possibly in a different file.. I find Tibb's arguments > > pretty convincing.. > Tibs is like that. Get used to it and *use* him ;^> Hmm. I thought I was the one who always got off on the wrong foot, had a long argument defending an indefensible position, and then finally collapsed with "oh, I see, yes you're clearly right". Which is another sort of usefulness, in its way. > >> - documenting Packages > > There definitely need to be provisions for that. > erm ... we already have __init__.py, so a __doc__.py might be cool ? > Possibly alongside __doc__.stng or some such containing the ref docs ? I would say that the package docstring is the docstring in __init__.py. I would probably avoid having a __doc__. file, 'cos I wouldn't be able to guess exactly what it was for. And I've been using ".stx" for an extension, in compatibilty with STClassic and STNG - but maybe we *should* be using different extensions for different files. I hereby propose: '.stx' -- the file contains text compatible with either STClassic or, more usefully, STminus '.stng' -- the file contains STNG text. It may or may not be parsable by ST parsers '.stpy' -- the same position for STpy text Is this worth "formalising" by also posting it onto the ZWiki? Or is it just overcomplex? Is it a bad thing we've lost any reference to "tx" in the string? Eddy then wandered off into the paragraph labels debate: > While I'm at it: I loathe and despise the apparent demand > (seemingly from both schools - Tibs: does your form require > the space between the Author and Version lines ?) for blank > lines in various places which feel (to me) just plain wrong; > I want to write (descriptive lists and, in particular, ...) > the Edward-form of the last as > > Document information > Author -- Guido > Version -- 3.14 > > A random paragraph which isn't a sub-paragraph of the > document information. > > and believe I shall not be alone (among python programmers) in > tending to neglect those blank lines (in *all* descriptive lists) > and being irritated by any tool which insists on me adding them. > Gratuitous whitespace `halves' the amount of information I can > fit in front of my eyeballs at a given moment and that *matters* > to the *maintainability* of my code - an issue which I take very > seriously. Hmm - I'm well aware of how far Eddy will take the elision of vertical whitespace to keep stuff compact. Strangely enough, the example Eddy gives (in Edward form) *is* legitimate STpy, since the list items start new paragraphs. If it were cast in my form: Document information Author: Guido Verision: 3.14 then it doesn't work, since labelled paragraphs don't start a new paragraph. For the moment, tough. However, I do tend to share the view that this is unnatural - I *think* that most people will expect the above to work, because that "word followed by colon" thingy is something we're used to. Unfortunately, to integrate new paragraphs starting on these things into the code in a neat manner is more than ten minutes work, and there are more important things to do for the alpha release. I think this is the sort of issue that should be raised thereafter, in discussion of the PEP, and I would certainly support changing the code if there were agreement on this issue (heh, I'm aiming at 2.2 now, since 2.1 is obviously hopelessly soon, so the pressure is off a bit). > Given the choice between > pandering to the tastes of a tool for extracting > the documents and > being kind to the poor sod who has to maintain the code > I am typically going to side with the latter Oh yes, me too (it's just a matter of lack of time - firstly for coding, and secondly for convincing myself that *yet another* special case isn't one too many for the STpy mindspace - I kid myself I'm getting a feel for ST-zen, and want to *know* things fit in well). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 16 10:50:39 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 16 Mar 2001 10:50:39 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103152137.f2FLblp17063@gradient.cis.upenn.edu> Message-ID: <00aa01c0ae06$f07414b0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > * Are list items required to have contents? I.e., can a list > item be just a bullet? This only makes sense to me if you > used it in an environment like:: > > 1. > > text... I'm not sure. The obvious reason for allowing empty list items is to allow people to start a list and fill things in later - possibly even easier to argue with descriptive lists:: fish -- eat them on Friday dogs -- don't eat them in western countries snakes -- horses -- OK in France To be honest, I'm not even sure what the current docutils code does. Not that that matters too much, since it would only need an RE tweaking. Given I do support an empty '::' paragraph (although for, erm, odd reasons) and I do tend to write incomplete lists when working on documentation, I think I'm convincing myself. (Mind you, I don't like your specific examples, 'cos they're laid out in a way I don't like - but your examples become OK if my examples get allowed, so...) > * Apostrophes can appear in the middle of a word or at the end > of a word, like "isn't" and "dogs'". Is it illegal to have > multiple apostrophes in the same word? As in "*should* it be illegal"? Don't know - I would tend to think so pending proof otherwise, unless my current code doesn't care (!) > There are no English > words that use multiple apostrophes, but I'm not sure about > other languages (although there are probably some languages > that have words with apostrophes at the beginning of a word, > ("'til"?) and StructuredText clearly won't deal with those..) Oh dear - but English has "'phone" and various Yorkshire-style things which start with an apostrophe - sounds like another thing I need to check out. > * When parsing various structures, like paragraphs and list > items and bold items, what whitespace is kept? E.g., if I > were to export to XML, would the trailing whitespace on > paragraphs be included? Or the whitespace between a > description list key and the hyphen? Trailing whitespace is removed right at the start. "structural" whitespace like that between a descriptive list "key" and the hyphens is lost. Whitespace comprising newline and indentation is conflated to a single space (except in literal paragraphs). How whitespace in literal strings is rendered (i.e., as   equivalents, or not) is *probably* left as an issue for the implementor of the renderer (I haven't decided yet, for STpy). > * Can #inline# expressions contain newlines? I assume not > ('literal' expressions can't.) Oh yes they can, in docutils, and that's because I want them to, which means that in STpy they can as well (although it's obviously not documented yet). This is the sort of issue that only comes up when implementing (and I count your producing an EBNF as implementing, I guess, in this case). Reasoning - implementation first. When reassembling lines back into paragraph text, it is easy to either reinsert line breaks (and maybe indentation again) or just a single space. If you want to make RE handling easier, a single space wins big time. But that means you can't stop literal strings spanning newlines. Oh. Philosophical second. OK - I hadn't though of that (goes I). But I *had* been being irritated by trying to use pseudo-STpy in my emails, 'cos (heh, another "'" at the start of a word!) I'm using Outlook in which it is very hard to tell where lines will be broken, which means use of '..' is hard. And since an email variant of STpy is both easy to imagine and should be easy to do, this is a pity. But if the behaviour of *all* quoted things over linebreaks is well defined, and the same (and the implementation above is that natural usage) then the problem goes away. Incidentally, that is also why I'm not sure yet about whether spaces in string literals should be "hard" or "soft". I'm still thinking on it (I tend towards "hard", but worry about 'very long string literals which will not fit on a single line when being rendered and thus look really stupid going off the right hand margin'). > * What are valid expressions for starting an ordered list item? > Currently STNG uses "([a-zA-Z]+\.)|([0-9]+\.)|([0-9]+\s+)" > i.e., a series of letters followed by a dot, a series of > numbers followed by a dot, or a number followed by space. > This seems wrong to me, because it implies that the following > are ordered list items:: > > Hi. This is a list item. > > 12 is a fun number. > > And it does not allow for expressions like: > > 1.2. This is a list item. I thought it *did* allow an optional dot? Oh well, memory again. The requirement for a dot in STpy was specifically to stop that problem. > Also, note that since in STpy variants (which will include > my proposed markup for formatted docstrings), list items can > begin without an intervening space.. So we would get:: > > The first line is a paragraph but the second line is a list > item. (Since it starts with letters followed by a dot) Erm - space -> blank line. That one shouldn't fly in STpy, because it would have to be:: i.t.e.m. (Since because it's meant to be "one letter, or one or more digits" > Even if we restrict ourselves to Roman numerals, we have > problems:: > > Hopefully someone who can figure this out who is > smarter than > I. But I don't see a way to use roman numerals safely.. Hmm - yersss. My tack on that one is, unfortunately, that it is a case of "so don't do that" - the basic problem with ST is that there are *some* things one can't do, because it is striving for naturalness otherwise. But on the other hand: > So maybe we could just use "([0-9]+\.)+"? Personally, I wouldn't much mind if it were only letters and "arabic" ("indian"?) digits. The reason for having all three forms is that a rendered MIGHT use the form the user used to decide what form the rendering should use (and those three forms are common to all list formatters in common use). Given that's something people might care about, it makes sense (of course, I believe ST implementations have tended NOT to make such use of the forms, but still). > * What restrictions are there on hfrefs ("name"://http:some.url) > According to STNG, they can use relative URLs ("name":whatever). > These end up being pretty tricky to formalize.. > > * Can href names span multiple lines? > * Can href names contain coloring? (I'd like to say no) > * Should the string '":' only be allowed for hrefs? > Or maybe '":(?!\s)', so you can say "this": that? > * What do you do with things like:: > > This *is "too* confusing":http://some.url > > (Keeping in mind that things like this should be ok):: > > Normally *quotes " don't have* any special meaning," > so they don't have to nest properly.. Hah - URLs (URIs?) are impossible to do right in ST (of whatever form). There's a reference in TextRE.py to a page that describes the problems. The rules I'm working towards are probably going to be something like: 1. If it looks vaguely like a URL, expect it to be mistaken for one (and "vaguely" will, of course, be defined) 2. URLs will not be allowed to span multiple lines, and if they contain spaces (and maybe some other characters) then those will need to be encoded (in the "normal" manner) 3. Colouring does not occur in URLs. Confusing examples I'll leave until after the alpha release, I'm afraid (but the general rule is probably "if it's confusing, don't be surprised if it is, indeed, confusing"). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 16 11:04:35 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 16 Mar 2001 11:04:35 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103160059.f2G0xep05577@gradient.cis.upenn.edu> Message-ID: <00ab01c0ae08$e30941e0$f05aa8c0@lslp7o.int.lsl.co.uk> Me, pulling things out at random... Edward Loper wrote: > I would interpret that as: > > > AuthorGuido > Version3.14 > > ... > Post-alpha docutils, we are definitely going to have to have a discussion about what we want to use as terms in the DTD - which is good, since I'm not convinced that the tags I've come up with are the best possible choice. > But we have to be careful here, since the analogy doesn't completely > carry over. Notably, this isn't what most Python programmers would > think it is:: > > Author: > Person1 > Person2 > > Person3 > > The 3rd person is not part of the author list; it's some sort of > subparagraph of it, which doesn't make much sense to me.. Oh yes they are (chorus of "behind you" (obPanto reference)). Well, actually, it's more complex than that. The example is *either* similar to:: Author: Person1 Person2 (but in which case it also isn't a labelled paragraph, because "Person1" didn't start a new sub-paragraph, and non-subparagraphs labelled paragraphs can't have more than one line, if you see what I mean), *or* it is similar to:: Author: Person1 Person2 Person3 in which case Person3 *is* an author. It will always be necessary to do:: Heading Text... to get a legacy-style ST heading, it would be too confusing otherwise. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 16 11:09:23 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 16 Mar 2001 11:09:23 -0000 Subject: [Doc-SIG] quoting In-Reply-To: <200103160101.f2G11Rp05742@gradient.cis.upenn.edu> Message-ID: <00ac01c0ae09$8e546c50$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > It just seems like there should be *some* mechanism for backquoting, > since there are some things that you can *not* express without it.. > Would people still object if we made some mechanism available, but > *strongly* discouraged people from using it when they could avoid > it (since it makes the plaintext hard to read)? I also feel we *probably* want some means of quoting. *But* before introducing one, I want an exhaustive example of other cases than the need to quote a single quote within literal texts (and this would include, if people want it, a *really* careful argument for why one needs to be able to quote '#' in '#..#', although I feel this is probably needed too). *After* we have an exhaustive list of all the places we *need* text escaping, *then* we can try to define an STpy-like manner of doing it. Note that this is known to be swampy ground, else STNG would have something in place already. The difficulty of coming up with something "natural to read" (and I'm still not convinced that '\' fits the tag!) makes this an item I want to defer, probably until after release 1.0 of STpy and STminus. (Yes, I know it *feels* like something one must have, and not having it makes STpy and STminus difficult to self-document, but the ST-lesson is that, in general, difficult cases don't actually come up in real life.) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 16 11:13:44 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 16 Mar 2001 11:13:44 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103160312.f2G3Cap18781@gradient.cis.upenn.edu> Message-ID: <00ad01c0ae0a$2a4b7c70$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper: > > (well bugger me - my #inline(code)# proposal got implemented !) > > That was your idea? One day, we should trace back all of these ideas > to their originators, and maybe give them credit or something. :) It was Eddy's idea. I believe I still have enough of the archives of the doc-sig to be able to reconstruct most of the attribution for "new" ideas, so yes, it might be a nice thing to note who came up with what (although people like David Ascher also deserve note for sheer effort as well (btw, I think he came up with ':' tags)). > > 2a. subsidiary cases is not allowed under STClassic, so one wouldn't expect to add it. > > 3: some of us like colons Yes, well. > It's hard to come up with a rule that's both simple and safe, but > covers cases like '2a.' and '3:'. So, unless Tibs or others > strongly object, I think we should just stick with '([0-9]+\.)+'. :) I'll think about this harder after alpha release, but it sounds like a non-silly idea for STminus, at least. > > > * What do you do with things like:: > > > > > > This *is "too* confusing":http://some.url > > > > Find author, apply pain (to taste). > > Give them the opportunity to retract. > > If they refuse, apply lethal doses of pain. > > Then they won't repeat the offence. > > No problem. > > Perhaps I should rephrase that. What should a *parser* do? > I guess "die" is a good answer, though it sounds like you might > prefer something along the lines of "erase their hard drive." :) A parser will have no problems with that text. It will parse it and give an answer. Whether it is what the user would expect (if it's docutils, it will probably be what *I* expect!). btw, I assume that is going into your test suite as an awkward case? Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 16 11:23:19 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 16 Mar 2001 11:23:19 -0000 Subject: [Doc-SIG] What counts as a url? In-Reply-To: <200103160432.f2G4WAp25746@gradient.cis.upenn.edu> Message-ID: <00ae01c0ae0b$809ba090$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > So I'm working on adding HREFs to STminus. They look like this:: > > "anchor name":URL OK. > Where URL is either a relative URL or an absolute URL.. So I went > and looked up "RFC 2396":http://www.w3.org/Addressing/rfc2396.txt . > It suggests (if I'm reading it correctly) that we could define > a URL as:: > > ([a-zA-Z0-9-_.!~*'();/?:@&=+$,#] | %[0-9a-fA-F][0-9a-fA-F])+ Ah - do we want URLs or URIs? I can never remember the difference. I am loathe to stop people from using the full generality of "pointers to the web", and this means delving into nasty stuff. See http://www.foad.org/~abigail/Perl/url2.html for some interesting details. I think we need to avoid that. > Should we use that regexp for URLs? Or perhaps we should go for > simplicitly, and say that the regexp ends at whitespace:: > > [^\s]+ > > In either case, we'll have to be careful to say:: > > See "this":http://url . > > instead of:: > > See "this":http://url. Hmm, that breaks with ST tradition, and indeed my code treats that final "." as not being part of the URI. Hmm. > Is that a problem? If so, what can we do about it? > (Keep in mind that it *is* acceptable to have a URL that ends in a '.').. I'll think on it, for my part (and read some specs). > Of course, I don't think people will be including HREFs in their > documentation much, anyway.. So the main issue for most people > will just be that they can't use '":' in certain environments.. Erm, I wouldn't bet on that. And we *are* trying to retain compatibility/usefulness as a tool for working on text files as well, remember, where this sort of thing is more likely. Tibs (slightly worriedly) -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Fri Mar 16 15:29:05 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 16 Mar 2001 10:29:05 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Fri, 16 Mar 2001 10:25:34 GMT." <00a901c0ae03$6f89ca50$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103161529.f2GFT5p10509@gradient.cis.upenn.edu> > Hmm - trying (also in 1.5.2): > > >>> class fred: > ... def f1(): > ... """f1""" > ... > >>> class jim: > ... def j1(): > ... pass > ... j1.__doc__ = fred.f1.__doc__ > ... > >>> jim.j1.__doc__ > 'f1' > > Well, that's what I was thinking of - I can't imagine it's broken in 2.n Hm.. That works. It seems very strange and magical to me that *within* jim, you can say #j1.__doc__ = foo#, but *outside* of jim, you have to say #j1.im_func.__doc__ = foo#.. > Not even one word - it will have to be an identifier (in the XML sense, > a name - which is, I think, much the same as a > Python identifier). XML defines:: Name = (Letter | '_' | ':') (NameChar)* NameChar = Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender (CombinindChar and Extender are for international support, I think) So the regexp would be something like:: [a-zA-Z_:][a-zA-Z0-9.-_:]* Which is not quite the same as a python identifier.. But if that's what we want, then I'll go ahead and add it to STminus (although I still don't understand its semantic value -- how do you identify things as targets of these links? > (btw, Edward (L), I *think* it should be possible to make dodgy and > erroneous constructions emerge in the DOM tree marked up as such - that > is, for instance, "badpara" instead of "para" or "dlist" or whatever - > or maybe as an attribute - so that a renderer can highlight things > appropriately - I've some notes I want to compare with the source code > this weekend to see how feasible it is.) Sounds good. Currently, the STminus implementation will tell you what character it failed at, but not much more than that. (Of course, I don't think anyone will actually *use* the current STminus implementation, so...) > > >> - documenting Packages > > > There definitely need to be provisions for that. > > erm ... we already have __init__.py, so a __doc__.py might be cool ? > > Possibly alongside __doc__.stng or some such containing the ref docs ? > > I would say that the package docstring is the docstring in __init__.py. Hm.. I think the package (API) docstring is different from the reference docs for the package. So maybe the reference docs could go in __doc__.stpy, maybe they could go elsewhere.. I think we should leave that to a future project (the one that figures out where all the docs that are not inline API docs go). > I hereby propose: > > '.stx' -- the file contains text compatible with either > STClassic or, more usefully, STminus > '.stng' -- the file contains STNG text. It may or may not > be parsable by ST parsers > '.stpy' -- the same position for STpy text > > Is this worth "formalising" by also posting it onto the ZWiki? Or is it > just overcomplex? Is it a bad thing we've lost any reference to "tx" in > the string? Sounds good to me. Why do we care about losing "tx"? The only association I have with "tx" is "transfer".. > Hmm - I'm well aware of how far Eddy will take the elision of vertical > whitespace to keep stuff compact. > > Strangely enough, the example Eddy gives (in Edward form) *is* > legitimate STpy, since the list items start new paragraphs. If it were > cast in my form: > > Document information > Author: Guido > Verision: 3.14 > > then it doesn't work, since labelled paragraphs don't start a new > paragraph. Hmm.. here's an idea that was (I think) inspired by something Eddy said earlier.. 1. list items (and maybe labels) can start without preceeding whitespace. 2. it's an error to have a set of list items "at the same level" as paragraphs -- the entire list must be indented. The result being that the following is illegal: I do not intend to start a new list item but I like the number 12. It's a good number. That might keep people from accidentally starting list items.. Instead, if they wanted the list item, they'd have to say: I do intend to start a new list item and I like the number 12. 12. This is a list item. Or something like that.. > For the moment, tough. However, I do tend to share the view that this is > unnatural - I *think* that most people will expect the above to work, > because that "word followed by colon" thingy is something we're used to. Hmm.. If you can decide what you're eventually going to do, I'd appreciate it, so I can code that into STminus.. STminus is going to be a formal description, so will be much less easy to change than STpy (well, not technically difficult to change, but if it becomes an accepted standard and programs are written assuming that it is "correct"...) But if it's going to stay a subset of STpy, it needs to know where STpy is going.. -Edward From edloper@gradient.cis.upenn.edu Fri Mar 16 16:00:30 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 16 Mar 2001 11:00:30 EST Subject: [Doc-SIG] formalizing StructuredText Message-ID: <200103161600.f2GG0Vp13146@gradient.cis.upenn.edu> > I'm not sure. The obvious reason for allowing empty list items is to > allow people to start a list and fill things in later - possibly even > easier to argue with descriptive lists:: > > fish -- eat them on Friday > dogs -- don't eat them in western countries > snakes -- > horses -- OK in France > > To be honest, I'm not even sure what the current docutils code does. Not > that that matters too much, since it would only need an RE tweaking. Ok. For now, STminus will say that empty list items are valid. > Given I do support an empty '::' paragraph Although that gets treated as a special case.. The empty paragraph doesn't get preserved.. > > * Apostrophes can appear in the middle of a word or at the end > > of a word, like "isn't" and "dogs'". Is it illegal to have > > multiple apostrophes in the same word? > > As in "*should* it be illegal"? Don't know - I would tend to think so > pending proof otherwise, unless my current code doesn't care (!) Yeah, all of my questions were "should" questions. > > There are no English > > words that use multiple apostrophes, but I'm not sure about > > other languages (although there are probably some languages > > that have words with apostrophes at the beginning of a word, > > ("'til"?) and StructuredText clearly won't deal with those..) > > Oh dear - but English has "'phone" and various Yorkshire-style things > which start with an apostrophe - sounds like another thing I need to > check out. I think we're just going to need to give up on word-initial apostrophes. I don't see any way around it. How else can you distinguish the initial apostrophe in 'phone from the apostrophe in the literal 'phone'? Or even the word: 'phones' (posessive of 'phones) > Trailing whitespace is removed right at the start. "structural" > whitespace like that between a descriptive list "key" and the hyphens is > lost. Whitespace comprising newline and indentation is conflated to a > single space (except in literal paragraphs). How whitespace in literal > strings is rendered (i.e., as   equivalents, or not) is *probably* > left as an issue for the implementor of the renderer (I haven't decided > yet, for STpy). Whitespace in literals and inlines should be preserved. > > * Can #inline# expressions contain newlines? I assume not > > ('literal' expressions can't.) > > Oh yes they can, in docutils, and that's because I want them to, which > means that in STpy they can as well (although it's obviously not > documented yet). This is the sort of issue that only comes up when > implementing (and I count your producing an EBNF as implementing, I > guess, in this case). Hm. ick. I don't like that. > Reasoning - implementation first. When reassembling lines back into > paragraph text, it is easy to either reinsert line breaks (and maybe > indentation again) or just a single space. If you want to make RE > handling easier, a single space wins big time. But that means you can't > stop literal strings spanning newlines. Oh. I can see ways around that. But for now, I'll just say that I think that what "makes more sense" in this case should trump what "is easy to implement"... (also, if you want to be more compatible with STNG, they don't allow newlines in literals.. :) ) > Philosophical second. OK - I hadn't though of that (goes I). But I *had* > been being irritated by trying to use pseudo-STpy in my emails, 'cos > (heh, another "'" at the start of a word!) I'm using Outlook > in which it is very hard to tell where lines will be broken, which means > use of '..' is hard. And since an email variant of STpy is both easy to > imagine and should be easy to do, this is a pity. But if the behaviour > of *all* quoted things over linebreaks is well defined, and the same > (and the implementation above is that natural usage) then the problem > goes away. There will always be problems using ST if you can't control your own line breaks. Otherwise, you'll get list items where you don't want them, etc.. I like making literals single-line because: * it tends to keep them short, which is a good thing * we can handle whitespace in a more sensible way -- spaces are preserved, and are hard. That way if someone wants to talk about the python string #'[ ]'#, they can. * apostrophes already seem dangerous to me, what with words like 'cos that can accidentally start literals and words like cats' that can accidentally end them. If literals don't span multiple lines, then a parser has a much better chance of noticing that something's wrong. > Incidentally, that is also why I'm not sure yet about whether spaces in > string literals should be "hard" or "soft". I'm still thinking on it (I > tend towards "hard", but worry about 'very long string literals which > will not fit on a single line when being rendered and thus look really > stupid going off the right hand margin'). I don't see why someone would ever really need a very long literal.. And if they don't mind it being broken up, they can split it up themselves.. > > * What are valid expressions for starting an ordered list item? [...] > > And it does not allow for expressions like: > > > > 1.2. This is a list item. > > I thought it *did* allow an optional dot? Oh well, memory again. The > requirement for a dot in STpy was specifically to stop that problem. Used to. Doesn't now. Who knows if/when/how it'll change. :) > That one shouldn't fly in STpy, because it > would have to be:: > > i.t.e.m. (Since > > because it's meant to be "one letter, or one or more digits" Hm. So no roman numerals in STpy? ok. > > So maybe we could just use "([0-9]+\.)+"? > > Personally, I wouldn't much mind if it were only letters and "arabic" > ("indian"?) digits. The reason for having all three forms is that a > rendered MIGHT use the form the user used to decide what form the > rendering should use (and those three forms are common to all list > formatters in common use). I don't think that people documenting modules will ever really care. Also, it seems like this might get rendered differently by different formatters, anyway. I've been using LaTeX for a long time, without ever feeling the need to tweak which ordered bullets it decides to use.. I have a feeling that the same is true of most people.. > Given that's something people might care > about, it makes sense (of course, I believe ST implementations > have tended NOT to make such use of the forms, but still). I doubt any implementation ever will, either.. :) > > * What restrictions are there on hfrefs ("name"://http:some.url) > > According to STNG, they can use relative URLs ("name":whatever). > > These end up being pretty tricky to formalize.. > > > > * Can href names span multiple lines? > > * Can href names contain coloring? (I'd like to say no) > > * Should the string '":' only be allowed for hrefs? > > Or maybe '":(?!\s)', so you can say "this": that? > > * What do you do with things like:: > > > > This *is "too* confusing":http://some.url > > > > (Keeping in mind that things like this should be ok):: > > > > Normally *quotes " don't have* any special meaning," > > so they don't have to nest properly.. > > Hah - URLs (URIs?) are impossible to do right in ST (of whatever form). > There's a reference in TextRE.py to a page that describes the problems. Hrm.. My questions were actually more concerned with the name part than with the url part. I assume that you're using the same basic markup here that STNG does (from reading STpy.html, it seems like you are). So which of the following are legal? :: "Here the *name* 'contains' markup":url "This name spans multiple lines":url "the following is not a url": Do *quotes "have to* nest" properly with coloring? > 1. If it looks vaguely like a URL, expect it to be mistaken > for one (and "vaguely" will, of course, be defined) I assume you mean this for when they just include an absolute url in the text, like http://foo.bar . > 2. URLs will not be allowed to span multiple lines, and if > they contain spaces (and maybe some other characters) > then those will need to be encoded (in the "normal" > manner) Agreed.. > 3. Colouring does not occur in URLs. Agreed.. Although how you decide where the url ends isn't obvious.. -Edward From edloper@gradient.cis.upenn.edu Fri Mar 16 16:02:37 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 16 Mar 2001 11:02:37 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Fri, 16 Mar 2001 11:04:35 GMT." <00ab01c0ae08$e30941e0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103161602.f2GG2bp13365@gradient.cis.upenn.edu> > Post-alpha docutils, we are definitely going to have to have a > discussion about what we want to use as terms in the DTD - which is > good, since I'm not convinced that the tags I've come up with are the > best possible choice. I was just using fairly random tags, for illustrative purposes, but I do think it's a good idea to discuss that. -Edward From tony@lsl.co.uk Fri Mar 16 16:09:37 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 16 Mar 2001 16:09:37 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103161529.f2GFT5p10509@gradient.cis.upenn.edu> Message-ID: <00bb01c0ae33$7ff2fd50$f05aa8c0@lslp7o.int.lsl.co.uk> (hmm - I keep hitting reply-to-all, and obviously you do the equivalent - partly 'cos it means a copy *does* definitely go to the doc-sig. But do we really want copies directly to ourselves as well? (I'm unfussed either way)) Edward D. Loper wrote: > Hm.. That works. It seems very strange and magical to me that > *within* jim, you can say #j1.__doc__ = foo#, but *outside* of > jim, you have to say #j1.im_func.__doc__ = foo#.. Hmm. Odd. Maybe something to ask over on the Python list per-se (if no one else, Alex Martelli might explain something at long around it) > XML defines:: > Name = (Letter | '_' | ':') (NameChar)* > NameChar = Letter | Digit | '.' | '-' | '_' | ':' | > CombiningChar | Extender > (CombinindChar and Extender are for international support, I think) > > So the regexp would be something like:: > > [a-zA-Z_:][a-zA-Z0-9.-_:]* Hmm. I might prefer to say "a Python identifier", then, as I don't need the "namespace" bit (which is what the colon is for). > W(although > I still don't understand its semantic value -- how do you identify > things as targets of these links? If one looks at a modified version of a PEP, for instance, then one would have something like:: This, of course, is discussed at length in Fredricksen [F1], but you can find that yourselves - of course, Jimson [J2] refutes it emphatically ..[F1] Fredricksen and Cohorts, The magazine magazine, Some Publishing House, 1978, vol3 issue 97 ..[J2] Jimson, archived at http://www.jimson.notspam/j2.pdf This *may* have been defined in the last-but-one round of docstring syntax, rather than the last one (memory fails me). The "target" of the references is (etc. - you get the idea). I would imagine that in alpha1 they don't trigger a new paragraph, but later on they probably should (same arguments as ':' labels). I'll need to dig up my paper copies of the relevant discussion, 'cos on the face of it it would be nice to be able to do [1] as well (like the PEPs do), but that can't be so if we're enforcing identifier-syntax for the "footnote" ref. There was a justification for it, but that's in the same place as the rest. Short term, it lets me "emulate" a PEP, which is a Good Example to be able to emulate. I know there was a better reason as well... > > I would say that the package docstring is the docstring in > > __init__.py. > > Hm.. I think the package (API) docstring is different from the > reference docs for the package. So maybe the reference docs could > go in __doc__.stpy, maybe they could go elsewhere.. I think we > should leave that to a future project (the one that figures out > where all the docs that are not inline API docs go). I agree on the leaving it. (and why wouldn't the reference docs for a package be in package/reference.stpy, then?) > Sounds good to me. Why do we care about losing "tx"? The only > association I have with "tx" is "transfer".. I would *guess* it started out as "stx" or "stxt" and the latter never caught on. > Hmm.. here's an idea that was (I think) inspired by something > Eddy said earlier.. > 1. list items (and maybe labels) can start without preceeding > whitespace. > 2. it's an error to have a set of list items "at the same level" > as paragraphs -- the entire list must be indented. > > The result being that the following is illegal: > > I do not intend to start a new list item but I like the number > 12. It's a good number. > > That might keep people from accidentally starting list items.. > Instead, if they wanted the list item, they'd have to say: > > I do intend to start a new list item and I like the number > 12. > 12. This is a list item. > > Or something like that.. Hmm. Messy to think about in the implementation (and in this sort of thing, I'm beginning to trust how easy it is to do in Python, as a guideline to how easy it is to think about). But worth bearing in mind. There is an outstanding conceptual problem with indentation and lists, anyway. If one has:: Some non-list text, followed by 1. A list then it is clear how the list is indented - i.e., not specially. On the other hand, if one has:: Some non-lists text, followed by 1. A list then STpy leaves it fuzzy what happens, and currently docutils will put a block around the list, causing it to be indented "extra". On the other hand, sometimes people want that effect (the difficulties of mixing presentation and markup, and not being specificallty a typesetting language - ho hum). I think this whole sort of thing will need addressing either at alpha/beta time, or possibly post-1.0. > Hmm.. If you can decide what you're eventually going to do, I'd > appreciate it, so I can code that into STminus.. STminus is going > to be a formal description, so will be much less easy to change > than STpy (well, not technically difficult to change, but if it > becomes an accepted standard and programs are written assuming that > it is "correct"...) But if it's going to stay a subset of STpy, it > needs to know where STpy is going.. I *should* know what I *want* to have happen by the alpha release (even if the "example application" doesn't yet do it). And I am *fairly* sure that what I will *want* to have happen is what Eddy wants as well. But we'll see. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Fri Mar 16 16:12:38 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 16 Mar 2001 11:12:38 EST Subject: [Doc-SIG] quoting In-Reply-To: Your message of "Fri, 16 Mar 2001 11:09:23 GMT." <00ac01c0ae09$8e546c50$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103161612.f2GGCcp13936@gradient.cis.upenn.edu> > I also feel we *probably* want some means of quoting. > > *But* before introducing one, I want an exhaustive example of other > cases than the need to quote a single quote within literal texts (and > this would include, if people want it, a *really* careful argument for > why one needs to be able to quote '#' in '#..#', although I feel this is > probably needed too). I think there are two issues here: 1. #..# and '..' have *semantically* different meaning. Although we *can* do #'# or '#', is it always appropriate? 2. Any string containing both "'" and '#' can't be written in current ST. Those are the only strings that can't be written in current ST (as either literals or inlines). (They can actually be written, but only in a literal block). I think that the first issue is an important one. It bothers me to have to say #'til# when I'm talking about an English word, because the markup suggests that I'm talking about Python code. > *After* we have an exhaustive list of all the places we *need* text > escaping, *then* we can try to define an STpy-like manner of doing it. I think that (2) is an exhaustive list of the places we *need* text escaping, if you don't mind making people use apostrophes to quote things like '*' and '"a":http://foo'. > The difficulty of coming up with something "natural to read" (and I'm > still not convinced that '\' fits the tag!) makes this an item I want to > defer, probably until after release 1.0 of STpy and STminus. I personally think that '\' is the most natural backslashing character, especially if the audience is programmers. But maybe I'm alone in thinking that... > (Yes, I know it *feels* like something one must have, and not having it > makes STpy and STminus difficult to self-document, but the ST-lesson is > that, in general, difficult cases don't actually come up in real life.) This is the ST-lesson from using ST to make web pages, not to document source code.. I fear we may find that documenting source code requires a number of things you could avoid in writing web pages.. -Edward From edloper@gradient.cis.upenn.edu Fri Mar 16 16:27:56 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 16 Mar 2001 11:27:56 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Fri, 16 Mar 2001 11:13:44 GMT." <00ad01c0ae0a$2a4b7c70$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103161627.f2GGRup14941@gradient.cis.upenn.edu> > > It's hard to come up with a rule that's both simple and safe, but > > covers cases like '2a.' and '3:'. So, unless Tibs or others > > strongly object, I think we should just stick with '([0-9]+\.)+'. :) > > I'll think about this harder after alpha release, but it sounds like a > non-silly idea for STminus, at least. I think that there are two ideas of what STminus does that are floating around: 1. it's a "simple," "clean" version of ST 2. it's an intersective subset of STNG and STpy Unfortunately, (1) and (2) are really mutually exclusive.. If STminus is to *actually* be an intersective subset of STNG and STpy, then it needs to take all the quirks of each into account, so it can make strings where STNG and STpy disagree undefined.. > > > > * What do you do with things like:: > > > > > > > > This *is "too* confusing":http://some.url > > > > > > Find author, apply pain (to taste). > > > Give them the opportunity to retract. > > > If they refuse, apply lethal doses of pain. > > > Then they won't repeat the offence. > > > No problem. > > > > Perhaps I should rephrase that. What should a *parser* do? > > I guess "die" is a good answer, though it sounds like you might > > prefer something along the lines of "erase their hard drive." :) > > A parser will have no problems with that text. It will parse it and give > an answer. Whether it is what the user would expect (if it's docutils, > it will probably be what *I* expect!). Hm. let's try once more. What answer *should* a parser give? ('error' is a possible answer) -Edward From edloper@gradient.cis.upenn.edu Fri Mar 16 16:32:15 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 16 Mar 2001 11:32:15 EST Subject: [Doc-SIG] What counts as a url? In-Reply-To: Your message of "Fri, 16 Mar 2001 11:23:19 GMT." <00ae01c0ae0b$809ba090$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103161632.f2GGWGp15248@gradient.cis.upenn.edu> > > Of course, I don't think people will be including HREFs in their > > documentation much, anyway.. So the main issue for most people > > will just be that they can't use '":' in certain environments.. > > Erm, I wouldn't bet on that. And we *are* trying to retain > compatibility/usefulness as a tool for working on text files as well, > remember, where this sort of thing is more likely. Yeah. But I've been writing a PEP recently, so mainly my thoughts are of "what are the obsticles to getting the Python community to accept xyz". :) I agree that it's an important issue for a larger scope, though.. -Edward From edloper@gradient.cis.upenn.edu Fri Mar 16 16:44:09 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 16 Mar 2001 11:44:09 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Fri, 16 Mar 2001 16:09:37 GMT." <00bb01c0ae33$7ff2fd50$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103161644.f2GGi9p15914@gradient.cis.upenn.edu> > (hmm - I keep hitting reply-to-all, and obviously you do the > equivalent - partly 'cos it means a copy *does* definitely go to the > doc-sig. But do we really want copies directly to ourselves as well? > (I'm unfussed either way)) I'm actually on the digest mailing list, so if you want me to get it faster, cc it to me. If you don't care, you can just send it to the mailing list. I've started sending my replies to you to the mailing list. > > XML defines:: > > Name = (Letter | '_' | ':') (NameChar)* > > NameChar = Letter | Digit | '.' | '-' | '_' | ':' | > > CombiningChar | Extender > > (CombinindChar and Extender are for international support, I think) > > > > So the regexp would be something like:: > > > > [a-zA-Z_:][a-zA-Z0-9.-_:]* > > Hmm. I might prefer to say "a Python identifier", then, as I don't need > the "namespace" bit (which is what the colon is for). If we also like things like [2], why don't we just say that it's '\w+'? > If one looks at a modified version of a PEP, for instance, then one > would have something like:: > > This, of course, is discussed at length in Fredricksen [F1], > but you can find that yourselves - of course, Jimson [J2] > refutes it emphatically > > ..[F1] Fredricksen and Cohorts, The magazine magazine, > Some Publishing House, 1978, vol3 issue 97 > > ..[J2] Jimson, archived at http://www.jimson.notspam/j2.pdf Ahh. Ok, now I know what we're talking about. In STpy.html, you say that the syntax is "name":[link]... And the "footnote" syntax is:: ..[link] http://foo/bar If we're implementing this, is there a chance that we could just ditch the in-line hrefs altogether, in favor of out-of-line ones? It would mean that we wouldn't have to deal with such issues as "what counts as a url", etc. Of course, some of my previous questions still apply.. E.g., can you say: "this name *contains* 'markup'":[link]? Oh, one other question: how does it make sense to ever use relative urls inside documentation strings?? I can see how it would be useful in other contexts.. but not in doc strings. > I agree on the leaving it. (and why wouldn't the reference docs for a > package be in package/reference.stpy, then?) reference.stpy sounds like a fine name to me. > > Sounds good to me. Why do we care about losing "tx"? The only > > association I have with "tx" is "transfer".. > > I would *guess* it started out as "stx" or "stxt" and the latter never > caught on. With "stx" it's not obvious anyway, so I wouldn't worry about it. > There is an outstanding conceptual problem with indentation and lists, > anyway. If one has:: > > Some non-list text, followed by > 1. A list > > then it is clear how the list is indented - i.e., not specially. On the > other hand, if one has:: > > Some non-lists text, followed by > 1. A list > > then STpy leaves it fuzzy what happens, and currently docutils will put > a block around the list, causing it to be indented "extra". Hm. I disagree. I don't think that the user should have that much control over formatting, anyway. I tend to think that ST should be a markup language, not a typesetting language.. > On the other > hand, sometimes people want that effect (the difficulties of mixing > presentation and markup, and not being specificallty a typesetting > language - ho hum). Too bad for them. :) If they really need *that much* control over their typesetting, they shouldn't be using ST. -Edward From tony@lsl.co.uk Fri Mar 16 16:55:07 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 16 Mar 2001 16:55:07 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103161627.f2GGRup14941@gradient.cis.upenn.edu> Message-ID: <00bd01c0ae39$db1c7ca0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > I think that there are two ideas of what STminus does that > are floating around: > 1. it's a "simple," "clean" version of ST > 2. it's an intersective subset of STNG and STpy > > Unfortunately, (1) and (2) are really mutually exclusive.. If STminus > is to *actually* be an intersective subset of STNG and STpy, then > it needs to take all the quirks of each into account, so it can > make strings where STNG and STpy disagree undefined.. Hmm - I think that's got to be for you to decide. It *may* be that both are sufficiently useful to want to have (I'd go for that), in which case you get to (a) choose two names and (b) choose which you do first. You *could* call the "minimal compatibility" one STminusminus, of course... > > > > > * What do you do with things like:: > > > > > > > > > > This *is "too* confusing":http://some.url > > > > > > Perhaps I should rephrase that. What should a *parser* do? > > > I guess "die" is a good answer, though it sounds like you might > > > prefer something along the lines of "erase their hard drive." :) > > Hm. let's try once more. What answer *should* a parser > give? ('error' is a possible answer) Damn, he's being persistent. Don't you just hate that. Thinks... No, seriously, the only way (with RE technology) that you are going to detect an error there is by adding that pattern to your "long list of error patterns" RE. To an RE-using system that looks for things going '*..*' and things going '"..":', there will simply be no ambiguity - the one that finds it first will win, leaving odd bits of "definitely not markup, guv" text lying strewn around it. Specifically, the above would *either* be:: plain: 'This ' emph: 'is "too' plain: ' confusing":http://some.url' *or* it would be:: plain: 'This *is ' urltext: 'too* confusing' [url: http://some.url] In neither case is there any ambiguity - it just depends on the order in which things are done. It's because it's done with REs, you see - there isn't any *real* understanding of document structure going on. All the poor user can do is stare at the result, and wonder why it isn't what they expected. Sounds like some compilers I've known. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 16 16:57:42 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 16 Mar 2001 16:57:42 -0000 Subject: [Doc-SIG] quoting In-Reply-To: <200103161612.f2GGCcp13936@gradient.cis.upenn.edu> Message-ID: <00be01c0ae3a$375557d0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: <<>> I think that probably *is* the case, exactly stated, with implications and reasons why STpy is not the same as STNG. Thanks. OK. I personally am not too happy with '\' for the escape, but I *would* like a solution to this. I hereby propose (unilaterally, us together) that we actually adopt the following proposal: 1. The only special quoting circumstances are putting a single quote within '..' and a hash within #..#. 2. The only place that STpy supports use of an escape character is in those places. 3. '\'' shall mean the same as a literal string containing single quote. 4. #\## shall mean the same as a Python literal string containing a single hash. 5. Within either of those literal string contexts, that is the *only time* that backslash is special. Thus it is sufficient to write #\#\\## to get a Python literal string containing a hash, a backslash and a hash. 6. If this is too confusing, don't do it. If you do do it, use a net. Or a bouncy castle. And in any case, don't blame us. and see if it works implementation-wise (but I don't promise to have it *working* in the alpha release). (Of course, the *other* escape tradition is simple-doubling - which would give us #### to get a single hash. Which is *very* confusing, and I don't want to go near it with an RE.) Hmm - must go. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Fri Mar 16 17:00:38 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 16 Mar 2001 12:00:38 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Fri, 16 Mar 2001 16:55:07 GMT." <00bd01c0ae39$db1c7ca0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103161700.f2GH0cp17194@gradient.cis.upenn.edu> (concerning:) > > > > > This *is "too* confusing":http://some.url > No, seriously, the only way (with RE technology) that you are going to > detect an error there is by adding that pattern to your "long list of > error patterns" RE. To an RE-using system that looks for things going > '*..*' and things going '"..":', there will simply be no > ambiguity - the one that finds it first will win, leaving odd bits of > "definitely not markup, guv" text lying strewn around it. Specifically, > the above would *either* be:: Well, it depends on how you're detecting errors... > plain: 'This ' > emph: 'is "too' > plain: ' confusing":http://some.url' Here, you could say that the string '":' without a matching '"' is illegal, and raise an error.. > plain: 'This *is ' > urltext: 'too* confusing' > [url: http://some.url] Here you could say that non-matching '*'s are illegal, and raise an error.. > In neither case is there any ambiguity - it just depends on the order in > which things are done. It's because it's done with REs, you see - there > isn't any *real* understanding of document structure going on. But from the point of view of someone formalizing the language, saying "there's an ambiguity" is no good. I have to either explicitly say "it's illegal" (=undefined) or "xyz is the correct answer." -Edward p.s., I'm not sure it's safe for us both to be writing email at the same time. We might overload other peoples' mailboxes. :) From edloper@gradient.cis.upenn.edu Fri Mar 16 17:07:13 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 16 Mar 2001 12:07:13 EST Subject: [Doc-SIG] quoting In-Reply-To: Your message of "Fri, 16 Mar 2001 16:57:42 GMT." <00be01c0ae3a$375557d0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103161707.f2GH7Dp17705@gradient.cis.upenn.edu> > 1. The only special quoting circumstances are putting a single quote > within '..' and a hash within #..#. > 2. The only place that STpy supports use of an escape character is in > those places. > 3. '\'' shall mean the same as a literal string containing single quote. > 4. #\## shall mean the same as a Python literal string containing a > single hash. > 5. Within either of those literal string contexts, that is the *only > time* that backslash is special. Thus it is sufficient to write #\#\\## > to get a Python literal string containing a hash, a backslash and a > hash. > 6. If this is too confusing, don't do it. If you do do it, use a net. Or > a bouncy castle. And in any case, don't blame us. I think we should add that '\\' is a single backslash and #\\# is too. Otherwise, there's no way to end a literal with a backslash.. > (Of course, the *other* escape tradition is simple-doubling - which > would give us #### to get a single hash. Which is *very* confusing, and > I don't want to go near it with an RE.) I agree that we shouldn't use that. -Edward From mal@lemburg.com Fri Mar 16 17:08:37 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 16 Mar 2001 18:08:37 +0100 Subject: [Doc-SIG] What counts as a url? References: <200103160432.f2G4WAp25746@gradient.cis.upenn.edu> Message-ID: <3AB24895.8DB07998@lemburg.com> "Edward D. Loper" wrote: > > So I'm working on adding HREFs to STminus. They look like this:: > > "anchor name":URL > > Where URL is either a relative URL or an absolute URL.. So I went > and looked up "RFC 2396":http://www.w3.org/Addressing/rfc2396.txt . > It suggests (if I'm reading it correctly) that we could define > a URL as:: > > ([a-zA-Z0-9-_.!~*'();/?:@&=+$,#] | %[0-9a-fA-F][0-9a-fA-F])+ > > Should we use that regexp for URLs? Or perhaps we should go for > simplicitly, and say that the regexp ends at whitespace:: > > [^\s]+ > > In either case, we'll have to be careful to say:: > > See "this":http://url . > > instead of:: > > See "this":http://url. > > (the '.' gets included in the second url). Is that a problem? If > so, what can we do about it? (Keep in mind that it *is* acceptable > to have a URL that ends in a '.').. > > Of course, I don't think people will be including HREFs in their > documentation much, anyway.. So the main issue for most people > will just be that they can't use '":' in certain environments.. > > Ideas/thoughts? FYI, I use this RE in my apps: r'\b((?:http|ftp|https|mailto)://[\w@&#-_.!~*();]+\b/?)' I don't think it makes sense to include schemes which are not supported by your everyday browser, so only the most common ones are included. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From edloper@gradient.cis.upenn.edu Fri Mar 16 17:58:51 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 16 Mar 2001 12:58:51 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Fri, 16 Mar 2001 16:35:10 GMT." <00bc01c0ae37$1142d200$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103161758.f2GHwpp20696@gradient.cis.upenn.edu> Tibs noted: > We need to be careful about that word "whitespace" (which I note you > sometimes still use to mean "blank lines" as well. Yeah, I've been playing a bit fast and loose with terminology in my emails.. :) Speaking of terminology, I want to make sure that we're using somewhat consistant terminology. In particular, I think my use of the following terms may not coincide with what you call things. What are your terms for the following? * inline = region marked with #hashes#. * paragraph = a text paragraph; not a list item or a heading or a label * basic block = paragraph or list item or heading or label (or table?) * blank line = (S* NL) | (S* EOS) * literal block = region following a '::'. * invalid string = string that is not given a meaning by an ST variant. (in the terms used by the STminus proposal, strings that are not assigned a structure by a language). Tibs continued: > When I am talking, I have some assumptions (which, of course, may not be > evident): > > 1. by the time discourse occurs, all tabs have gone away Agreed. We should probably also discard/transform any whitespace that isn't space or newline (e.g., form feed, carriage return). > 2. blank lines are blank lines - white space in them is ignored > thrown away (lost for good) Is this true in literal blocks? Also, I'm guessing you collapse multiple consecutive blank lines into one. > 3. trailing whitespace is thrown away Trailing whitespace for the string as a whole? For each basic block? For each line? Is this true in literal blocks? > 4. literal paragraphs retain leading whitespace following "the > rules" (which say they are actually indented relative to the > preceding non-literal paragraph - this makes much more sense > in ST than "with respect to the left margin"). Agreed. Although how do you put something at zero indentation? Maybe indent from 1 space over from the preceeding paragraph? > So, at the end of that, the term "whitespace" should be replaced by the > term "spaces". Newlines (sometimes I call them "line breaks", which may > be a better term) are a different thing. So we won't use the term whitespace. Instead, we'll use the terms space, newline, and blank line. > Clearly for a string literal that does not contain a newline, spaces are > to be transcribed to spaces (probably - flag a rendering issue as to > whether they're *hard* spaces (the correct number) or *unbreakable* > spaces (the correct number AND no newlines)). I vote for unbreakable, but it may be possible to persuade me. > Equally clearly, if one does not allow newlines in string literals, > that's the end of the matter. We've done our job. Which is what I vote for. :) > Unfortunately for simplicity, I saw that I could choose to lose newlines > *if I so wished*, and after a bit of thinking I decided I did so wish, > for the reasons I gave. In *that* case, one has to consider what the > sequence:: > > > > means within a literal string, and clearly the only consistent thing > *for* it to mean is a single space. Well, it could also mean a single newline (or
in html). But we shouldn't even go there. :) > > Hm. ick. I don't like that. > > Yes, well, that's the problem, and I need to think how much I *do* like > it, and then argue it out. Here's my current take on linebreaks in literals. Feel free to add/comment: Advantages of allowing linebreaks in literals: * you can have longer literals * you can press alt-q in emacs to have it re-word-wrap your paragraph, and not think about it (as much; you still have to worry about list items, labels (in the future), and maybe other things). * implementation reasons * the meaning of spaces and newlines in literals is not obvious to the un-initiated (no matter which meaning we choose). Advantages of not allowing linebreaks in literals: * you force people to use shorter literals * spaces in literals are meaningful in an obvious way * you're more likely to catch errors, because you're keeping things local. * it's conceptually simpler (i.e., easier to explain). Of course, if we say that linebreaks are not allowed in literals, docutils can still go on allowing them there, while just saying that it's "making a best guess" where a parser I wrote would probably flag a warning/error. > > I don't see why someone would ever really need a very long literal.. > > And if they don't mind it being broken up, they can split it up > > themselves.. > > Hmm. I have done in the past (but as ever, can't remember detailed > examples). It seems to me that either: 1. it's a literal that you don't mind having broken up, so you can break it up yourself (although then I question if it's really a single literal?) 2. it's a literal that you think shouldn't be broken up, so you shouldn't break it up in the plaintext -- put it on one line, and readers will have to understand that it's more than 75 characters because it shouldn't be broken up. > > Used to. Doesn't now. Who knows if/when/how it'll change. :) > > Oh dear. I find myself saying that a lot when I play with STNG. :) Hopefully getting a formal definition will start to change that.. > > Hm. So no roman numerals in STpy? ok. > Aagh - no, you're right. My mistake. > (although, at a different tack, I don't think "e" is a roman numeral?) No, it's not, but you get the point. :) I used that example because STNG currently allows *any* sequence of letters followed by a dot. > I'm not yet convinced about individual alphabetics - I *do* tend to use > that style myself quite a lot. I think that simplicity should be an important design goal for ST. But I might let single letters followed by a dot slide.. Esp. if parsers could give warnings when ordered lists were not ordered in a sensible way, as in:: This is not intended to be an ordered list, says I. But it starts with "I" instead of "A", so it will flag an error. > > "Here the *name* 'contains' markup":url > > Aagh - it's an order thing, 'cos at the moment URL recognition is done > by colourising. Given I don't want to worry about "internal markup" yet, > that *may* mean URLs must be done immediately after literals, and before > other markup. Hm.. I'm confused. So you would get::
Here the *name* 'contains' markup ? Or:: "Here the name contains markup":url ? Or something else? But at any rate, my question was more one of what ST "should" do, not what it does do.. One other case to consider is:: *"I would prefer this":url* to "*this*":url > > "This name spans multiple > > lines":url > > Remember my code doesn't see newlines any more inside paragraphs, so > that's no problem... But if we decide that literals/code don't span newline...? :) Still seems to me that names should be able to span newlines, though. > > "the following is not a url": > > If it happens to parse as a URL, it is, if it happens not to parse as a > URL, it isn't - either way, it's the writers fault for doing daft > things. Yes, but do we get an error because we used '":' in a silly context (if we're asking the parser to tell us about errors)? > > Do *quotes "have to* nest" properly with coloring? > > No, and I don't expect ever to try to make the code worry about it - > that would get grabbed as one or the other, under *any* scheme I'm ever > likely to write. But from the point of view of formalizing things, I have two choices here: 1. say that it contains a bold region, and the quotes are just rendered as quotes 2. say that it's undefined (i.e., an invalid string). > And regarless of whether one *should* be able to have a dot at the end > of a URL, I think in practice we may need to forbid it so we can have a > fullstop there instead (as I said, I think STClassic may do that, and > STNG certainly *did*). Ok. But that will need special mention somewhere. So we don't include the final dot if it's followed by a space, end of line, or end of string, right? But what about ".."? This seems like it will be very messy to formalize.. :( > > > 2. URLs will not be allowed to span multiple lines, > > > [...] > > Agreed.. > > (although, of course, as I said above, I currently have this blindness > about linebreaks - but you may argue me out of that yet Of course, since URLs shouldn't have spaces in them anyway, this isn't a problem. -Edward From guido@digicool.com Sat Mar 17 02:58:48 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 16 Mar 2001 21:58:48 -0500 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Thu, 15 Mar 2001 23:41:45 GMT." References: <008d01c0ac9e$ebb14f10$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103170258.VAA14151@cj20424-a.reston1.va.home.com> > >>> def hello(there): return there > ... > >>> hello.__doc__ = 'hello there' > >>> hello.__doc__ > 'hello there' > >>> class A: > ... def hello(there): return there > ... > >>> A.hello.__doc__ = 'hello there' > Traceback (innermost last): > File "", line 1, in ? > TypeError: attribute-less object (assign or del) > >>> A.hello.im_func.__doc__ = 'hello there' > >>> A.hello.__doc__ > 'hello there' > > It works for functions but not if they're hanging off the namespace of a > class ? The attribute shows up on the class method after I set it on > the class method's im_func ? This is wrong, obscene and ugly ! This is intentional. We actually changed this in 2.1b1; in the alpha version it *was* assignable. It turns out there are some potential future uses where assigning to attributes of class (or instance) methods should only affect the method in that class, not in any base classes; in particular, it was pointed out that this can be confusing: class C: def f(self): pass f.foo = 1 class D(C): pass D.f.foo = 2 # Would also change value of C.f.foo! Similarly, class C: def f(self): pass f.foo = 1 x = C() x.f.foo = 2 # Would also change value of C.f.foo! The ExtensionClass module in Zope actually implements class-like objects that behave in such a way that at least the first example (D.f.foo = 2) changes the f.foo value for class D but not for class C. So this is not just theoretical! --Guido van Rossum (home page: http://www.python.org/~guido/) From Edward Welbourne Sat Mar 17 13:50:26 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 17 Mar 2001 13:50:26 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103160029.f2G0T1p02926@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103160029.f2G0T1p02926@gradient.cis.upenn.edu> Message-ID: > If it's usually *not* what you want, or if we want to keep things > simpler, the following seems to work (I don't know why you don't > need to use .im_func here):: > class B(A): > def f(x): > return x+1 > B.f.__doc__ = A.f.__doc__ Close, but not quite - B isn't in scope yet when you say that: >>> class B: ... def f(s): return ... B.f.__doc__ = 'hello there' ... Traceback (innermost last): File "", line 1, in ? File "", line 3, in B NameError: B However, >>> class B: ... def f(s): return ... f.__doc__ = 'hello there' ... works fine; and the reason you don't need to use .im_func here is that the code of B's body gets executed in an empty namespace; the resulting contents of the namespace get massaged and packaged to become B's namespace, but only *afterwards* (read types-sig in late 1999 to find out why I don't like that: but it's how it is). So, when f.__doc__ got assigned, f was still a function: >>> class B: ... def f(s): return ... print type(f) ... (subject to usual caveats about my installed python still being 1.5.2) > I'd be happy with either. I'd solidly prefer to require an empty doc-string in a method which doesn't want to inherit the doc-string of its parent. In principle, at least when the method is part of the abstract data type spec of the base class, the derived class shouldn't really be messing with `what does this guarantee' statements which (as I understand it) you want doc strings to contain, albeit it may be refining it slightly. Plus, having to put the empty string in as doc-string is trivial, doesn't involve typing any magic __tokens__ and happens inside the body of B.f, rather than after it; and I *really* don't like having to *assign* B.f's __doc__ in order to get it to show something it should be inheriting/acquiring from A.f, at least under the rules of abstract data types. Notably, consider the case where a base class defines the API to be supplied by a family of allied classes; the base class exists to document what they all do, each derived class exists to implement one of the flavours of the relevant kind of object. However, shoe-horning that into an implementation may be an issue. Eddy. From Edward Welbourne Sat Mar 17 14:23:10 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 17 Mar 2001 14:23:10 +0000 (GMT) Subject: [Doc-SIG] ping Ping In-Reply-To: <200103160047.f2G0l6p04520@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103160047.f2G0l6p04520@gradient.cis.upenn.edu> Message-ID: > Sorry, that was unclear. My stance is: ah. So: Edward, Tibs and I, who have done this week's talking, all agree on a position which puts `API docs' into the source file and puts tutorials, reference docs, etc. elsewhere. But this discussion got kicked off by Ping (and the Spanish Inquisition, according to Tibs; and Eddy the turn-coat) arguing for the contrary. Hello, Ping, where are you ? We need your response to our gabblings. If only 'cos you're about as good at changing my mind as Tibs is ... Eddy. From Edward Welbourne Sat Mar 17 13:57:45 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 17 Mar 2001 13:57:45 +0000 (GMT) Subject: [Doc-SIG] reserved characters In-Reply-To: <200103160035.f2G0ZFp03453@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103160035.f2G0ZFp03453@gradient.cis.upenn.edu> Message-ID: > Not also that '[a, b, c]' is not always python code. It's also > mathematical notation.. Same with 'x*y'. as far as I'm concerned, mathematics and programs are the same stuff: crucially, neither is `plain text', so they aren't first-class citizens of the doc-string - they can both expect to be shoved inside #ghettos# of one sort or another. Ergo, no issue here. Eddy. -- Albeit my plaintext denotations can mostly live peacefully in doc-strings: http://www.chaos.org.uk/~eddy/math/ From Edward Welbourne Sat Mar 17 14:29:57 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 17 Mar 2001 14:29:57 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103160059.f2G0xep05577@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103160059.f2G0xep05577@gradient.cis.upenn.edu> Message-ID: >> While I'm at it: I loathe and despise the apparent demand (seemingly > Neither STpy (Tibs' version) nor whatever version of STminus I end up > proposing for docstrings will require blank lines before list items. phew. OK, I misunderstood. What a relief ;^> And a blank line between a heading and what it heads is fine by me, it was just the [DUO]L lists I was nervous about. > ... there's always something to be said for simplicity too.. If > someone writing documentation has to remember too many rules, they may > get confused.. which just says that *sometimes* pandering to the tool's tastes and being kind to the maintainer agree (and for `maintain' you can do an appropriate s/// as before if you wish). Eddy. From Edward Welbourne Sat Mar 17 19:41:03 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 17 Mar 2001 19:41:03 +0000 (GMT) Subject: [Doc-SIG] backslashing In-Reply-To: <200103161707.f2GH7Dp17705@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103161707.f2GH7Dp17705@gradient.cis.upenn.edu> Message-ID: hmm. I don't like backslash. I know you can use r'''...''' but I still don't like it. Sort of because I'm not guaranteed that I can see the start of a long doc string when I'm editing it, but partly because python's reading of \ in strings is already (... how shall I say this politely ...) sophisticated enough. However, I can add to your list of places where it's `needed': \n is needed in embedded literals because ... we aren't allowing them to span multiple lines, right ? But we have to allow doc-string authors to discuss strings with newlines in them (just like we have to allow them to discuss strings with ' and # in them - and, of course, with \ in them).

Someone who wants to include a code fragment including a comment can perfectly easily put it into one of the block-style structures for enclosing non-plaintext, as I argued when proposing #...# for this role; I'm inclined to apply the same ruling to code fragments like:: script.write('echo HTTP/1.1 200 OK\n# no headers\necho') which describe python code which uses a # other than as a (python) comment character. If you decide you want to let me discuss, inline, a shorter cousin of this: I'll point at my uses of \n in it and ask whether you really want me to reactivate perverse counterexample mode. I realise it'll be mildly irritating (and add to vertical space use) to have to go into a block to say a code fragment any time it's got a # in it; or a verbatim text any time it's got a ' in it (indeed, the latter is the more serious issue); at least when it's such a short fragment one wants it to be inline. But inline fragments are a luxury. Adding an escape character requires us to make provision for escaping the escape (else, as Edward pointed out, we can't *end* a fragment with the escape character). At which point the ability of folk to work out how many backslashes they're looking at depends not only on counting the backslashes they can see, and on working out whether the string is r'...' or not, but also on whether they're inside an inline fragment right now. This *will* confuse pythoneers. Confusion is worse than mild irritation. Ergo, don't do it. Not even to save vertical space ;^| (and, as some earlier cross-talk between myself and Tibs might clue you to notice, I consider vertical space a big sacrifice). Oblige doc-strings which want to talk about a fragment, using the delimiter ST* uses for the relevant kind of inline fragment, to do the fragment as a block, not an inline. This is an easy rule with no fancy repercussions. It doesn't collide with the already sophisticated reading of \ in strings: and it's easy to describe and understand. Ergo it is the Right Thing To Do. Please ? Eddy. From Edward Welbourne Sat Mar 17 18:06:55 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 17 Mar 2001 18:06:55 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <00bb01c0ae33$7ff2fd50$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <00bb01c0ae33$7ff2fd50$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: Edward, then Tony: >> XML defines:: ... >> So the regexp would be something like:: >> >> [a-zA-Z_:][a-zA-Z0-9.-_:]* > Hmm. I might prefer to say "a Python identifier", then, as I don't need > the "namespace" bit (which is what the colon is for). I don't like the XML Name; having embedded : won't sit will with pythonic reading; but python identifiers are too restrictive. Have we considered the classic spec for labels to appear left of a colon, namely RFC 822 (e-mail headers) and its kin ? I think that basically comes down to r'\w+(-\w+)*' as regex, generally specified (certainly in HTTP's variant on the theme) to be read case-insensitively and conventionally rendering each word Capitalised (e.g. Rfc-822-Compliant is normalised, though RFC-822-compliant is read as the same identifier). We might want to allow _ as well as \w (indeed, we might want to define \w to include _ given that python effectively does so). Eddy. From Edward Welbourne Sat Mar 17 17:14:09 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 17 Mar 2001 17:14:09 +0000 (GMT) Subject: [Doc-SIG] What counts as a url? In-Reply-To: <3AB24895.8DB07998@lemburg.com> (mal@lemburg.com) References: <200103160432.f2G4WAp25746@gradient.cis.upenn.edu> <3AB24895.8DB07998@lemburg.com> Message-ID: See the standard module urlparse. What we should really do is add, to the standard module urlparse, a function which takes a string and returns the length of the initial segment which is a URL, -ve value on failure (albeit the value is guaranteed to be > 3 if it's not -ve; but -ve, or maybe 0, is the right answer for compatibility with existing string/regex match/find routines) so that the caller can then snip off this chunk and pass it to urlparse.urlparse() ... erm hello, the doc page (at 1.5.2) mentions a `parameters' chunk following a semicolon. Never heard of it myself: This corresponds to the general structure of a URL: scheme://netloc/path;parameters?query#fragment. ick - it doesn't cut netloc into name:port, let alone supply scheme's default for port when omitted. It also doesn't specify query-parsing (OK, so that ends up making contentious design decisions, so fair enough) or url-decoding (OK, so one can't do that to any unless also to the query fragments, which implies parsing the query first). Ho hum. >> ([a-zA-Z0-9-_.!~*'();/?:@&=+$,#] | %[0-9a-fA-F][0-9a-fA-F])+ > r'\b((?:http|ftp|https|mailto)://[\w@&#-_.!~*();]+\b/?)' erm ... I'm fairly sure you're allowed at most one # and at most one ? in an URL: any others *must* be url-encoded as %[0-9A-Fa-f]{2} tokens. I'm fairly sure you aren't allowed an & before the ? and that the # has to appear after the ? and all & Marc's regex doesn't mention = and ? explicitly, but they're definitely allowed in URLs. Are () really allowed in URLs ? How about {} and [] ? I'm fairly sure : and , are allowed in paths. But I'd expect :,{}()[]*! all to be url-endoced, anyway, so they shouldn't appear in the regexen; they're covered by % and \w. There is an RFC for URIs, I mailed it to Edward recently; I guess that'd be >> and looked up "RFC 2396":http://www.w3.org/Addressing/rfc2396.txt . so go read the appendices (pedantically). I know the relevant RFC has a helpful Appendix A giving BNF and Appendix B advising how to parse, complete with a regex for parsing (which presumes you *check* separately, based on the BNF). I really don't like that space between the URL and the full-stop (sorry, `period', to translate into North American Anglic); but, no, I can't see how to avoid it. Other than to treat the end of a URL as `this may have been the end of a sentence', even if it isn't followed by a . so authors of doc-strings know they can treat the URL as sentence-end (unconvinced). oh - Mark: > r'\b((?:http|ftp|https|mailto)://[\w@&#-_.!~*();]+\b/?)' did you really mean `from # to _ inclusive' ^^^ or did you mean to say `#, - or _' ? Hmm, I think you mean the latter: put - last in the [...] But the latter reading claims you've missed out / in the path, and the former claims most entries in your [...] are duplicates of ones in the #-_ range. I'm confused. If the - is last, and we mention = explicitly, we can phrase the character class as [...=;-] with its hair standing on end, which seems entirely appropriate. . Why do regexes always feel like the right answer to the wrong question - albethey useful - ? Edward: will working in EBNF spare us these messes ? (I'm assuming that's Extended Baccus-Nauer Form, subject to spelling.) Tibs: would mxTextTools let us say this stuff less uglily ? I'm inclined to advise running with the way the RFC's appendices' approach the problem, though: first, parse according to Appendix B's regex, then (it explains better than I can here) take the fragments into which it's cut your putative URL text and check each fragment for validity according to the appropriate rules in appendix A, which depend on the scheme; if any fail their check, decide that this wasn't a URL anyway. Albeit this means fully parsing the URL, so maybe the right function to add to urlparse is one which reads, from a string, the longest initial chunk which is a URL, returning a tuple whose first item is the length, remainder are urlparse.urlparse()'s answers (at least when the length is positive). > I don't think it makes sense to include schemes which are not > supported by your everyday browser, so only the most common ones > are included. I think it does make sense to include them, for two reasons: i) we should *recognise that the text is a URL* even when we know not what to do with it, if only so we can warn the user - the principle of least surprise says that if you *have to* surprise the user (whose browser does know about a scheme you're ignoring) you should at least have the decency to warn. ii) forward compatibility - someone may add a scheme that really does deserve to be in there, and the tool should need minimal revision to cope. and I thought most browsers *did* cope with gopher, which you omit ... Yes, I admit it, I'm an old fuddy-duddy. But the right answer is to use the urlparse module, not to ad-hock together your own; if you don't like how urlparse does things, fix it. (Note: I'm as guilty as anyone on this - I'd written a much longer version of this e-mail, complete with my own od-hack regex, before even thinking to look for a module, at which point I instantly *knew* the module was bound to exist - and not be to my liking.) Eddy. From Edward Welbourne Sat Mar 17 18:16:57 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 17 Mar 2001 18:16:57 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103161644.f2GGi9p15914@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103161644.f2GGi9p15914@gradient.cis.upenn.edu> Message-ID: Tony, then Edward: >> On the other hand, sometimes people want that effect (the >> difficulties of mixing presentation and markup, and not being >> specificallty a typesetting language - ho hum). > Too bad for them. :) If they really need *that much* control over > their typesetting, they shouldn't be using ST. If they're thinking that hard about layout, they aren't concentrating hard enough on writing API documentation and the result isn't going to be maintainable ('cos when I next fix a bug in their code, and update the docs, I'm going to have zero patience with their fancy layout, so I'm going to normalise it to something straightforward). If we pander to the folk who care more about appearance than information content, we're doomed. Look what happened to poor old HTML ... Eddy. -- Keep It Simple, Styoooooooooooopid. Keep It Straightforward, Simpleton. From ping@lfw.org Sat Mar 17 20:30:05 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Sat, 17 Mar 2001 12:30:05 -0800 (PST) Subject: [Doc-SIG] Re: ping Ping In-Reply-To: Message-ID: On Sat, 17 Mar 2001, Edward Welbourne wrote: > Hello, Ping, where are you ? > We need your response to our gabblings. Hi, Eddy. I'm sorry i haven't had time to really particpate in this discussion this past week. I've been watching all the e-mail go by but haven't formulated my responses yet. I was originally considering writing my own PEP if i could find the time. Thanks for the invitation. My general feeling about all of the syntax ideas that have been going back and forth is that i'm a little afraid of their complexity. When i have a moment i'll try to get a handle on what rules are currently on the table and see how many there are, but i'll definitely want to keep them minimal. > ah. So: Edward, Tibs and I, who have done this week's talking, all > agree on a position which puts `API docs' into the source file and puts > tutorials, reference docs, etc. elsewhere. I will maintain the opposing position for now, as devil's advocate. Here are some points to consider, just off the top of my head: - Just to be clear, the suggestion on the table is only to move the library ref manual into the modules, not the language reference or anything like that. - If documentation lives in the modules, we can guarantee that any user of the module has the information they need to understand and use it properly. - Allowing extended documentation in a module does not preclude other people from writing other documents, tutorials, books, etc. on a particular topic. - If documentation for a module doesn't live in the module itself, how will a user find it? One source of motivation for this suggestion was running "perldoc CGI" -- having a copy of CGI.pm guarantees that you have an instantly available and fairly comprehensive description of all the things you can do with CGI.pm. - Keeping modules and associated docs in the same file helps to ensure that the two are in sync when you distribute or edit the file. (It's not possible to have different versions of the code and the docs at the same time; it's less likely that someone will check in changes to one without updating the other, etc.) -- ?!ng From Edward Welbourne Sat Mar 17 20:47:14 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 17 Mar 2001 20:47:14 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103170258.VAA14151@cj20424-a.reston1.va.home.com> (message from Guido van Rossum on Fri, 16 Mar 2001 21:58:48 -0500) References: <008d01c0ac9e$ebb14f10$f05aa8c0@lslp7o.int.lsl.co.uk> <200103170258.VAA14151@cj20424-a.reston1.va.home.com> Message-ID: Thank you Guido. That makes sense. Not naughty after all ;^) > The ExtensionClass module in Zope actually implements class-like > objects that behave in such a way that at least the first example > (D.f.foo = 2) changes the f.foo value for class D but not for class C. > So this is not just theoretical! see, I knew it'd be the sort of think Jim Fulton plays with. (OK, `think' was a typo: but it deserves to survive ;^) Which encourages hope. First off, I'm going to disparage your second example: > class C: > def f(self): pass > f.foo = 1 > > x = C() > x.f.foo = 2 # Would also change value of C.f.foo! This is strictly another matter: x.f isn't really the same thing as C.f, it `should' be currie(C.f, x), if you see what I mean, hence a different object from C.f, so setting its .foo shouldn't affect C.f, even with the old semantics (even assuming one's allowed to setattr on a bound method, which sounds *very* dodgy to me). So I would be rude enough to call this example a bug in pre-2.1b1 python and ask that it be left out of the discussion ;^> But the `D.f is C.f' problem is real. So, on the one hand, when a method is inherited, > class C: > def f(self): pass > f.foo = 1 > > class D(C): > pass > D.f.foo = 2 # Would change value of C.f.foo! but, at the same time, if the derived class *does* over-ride the method, >>> class A: ... def f(self): """doc string of A.f""" ... >>> class B(A): ... def f(self): return ... >>> B.f.__doc__ # B.f wishes it would `inherit' from A.f, but doesn't >>> Now any attempt at getting the latter desideratum, without the former naughtiness, is going to be sophisticated: but can it be done ? Clearly there *is* a sophisticated munging phase when B's namespace, having been built by execution of B's suite, gets transformed and packaged so that B.f is no longer simply a function; is it possible, at that juncture, to arrange that it will borrow __doc__ off A.f ? (Only if B.f lacks __doc__, naturally.) This would involve some added magic in the type of unbound methods but that's a pretty magical type *anyway* and it *looks* like it should be feasible by applying games similar to (though hopefully less complex than) those used by ExtensionClass. A derived class' re-implementation `should' behave like that of the base, or it abrogates its ADT, so having the same doc as the method being over-ridde should be `usual'. The exceptions incur a tiny cost - they have to supply a doc string, which can be empty if they want, but really they *should* be explaining why they abrogate the ADT anyway, given that the base class's other methods might exercise the replacement - and, without this borrowing, the usual case implies gratuitous duplication - either of the doc string or of the assignment from base. The latter is really ugly - I should not have to type __doc__ in any ordinary piece of code; only in introspectors. Note (for anyone who missed it) that Tony discovered one needn't go via im_func, as long as one assigns *before* B's namespace gets munged: >>> class E(A): ... def f(self): pass ... f.__doc__ = A.f.__doc__ ... >>> E.f.__doc__ 'doc string of A.f' This is because f is still an ordinary function object (in particular, it isn't yet E.f; E doesn't yet exist) when the assignment happened. There is, of course, an obvious problem: multiple inheritance, when more than one base supplies the over-ridden method. The solution is, of course, to borrow off the method on the earliest of the bases to provide it and leave the implementor to over-ride that by assigning if they must. This will be rare enough not to be an issue; and it will simply work, because either * you assign as above, in which case E.f had a __doc__ before munging, so borrowing off E's bases' .f didn't get invoked; or * you assign after the class body, via im_func, in which case you over-ride what the munging has done and it still works. How's the time machine doing ? Do methods yet `inherit' __doc__ when not over-ridden ? Eddy. From Edward Welbourne Sun Mar 18 11:18:36 2001 From: Edward Welbourne (Edward Welbourne) Date: Sun, 18 Mar 2001 11:18:36 +0000 (GMT) Subject: [Doc-SIG] backslashing In-Reply-To: (message from Edward Welbourne on Sat, 17 Mar 2001 19:41:03 +0000 (GMT)) References: <200103161707.f2GH7Dp17705@gradient.cis.upenn.edu> Message-ID: Oh, and the argument I missed because it was too obvious and I'm too exhausted: verbatim means ... verbatim. If my doc string says client code should ... call #obj.out('\#')# ... to achieve some effect, I should expect some client code to contain the fragment:: obj.out('\#') even if some authors do realise that the \ should be elided, while others read the docstring via a rendering tool which elided it for them; and this will more-or-less certainly be a bug: the string given really does have a backslash in it and really isn't just '#'. It would have been better if your tools had forced me to put the fragment in a block, where I *could* have said:: obj.out('#') so would have done so and wouldn't have confused authors of client code. Likewise - if anything more so - for 'verbatim'. If you provide a backslash escape mechanism for verbatim and code fragments, nearly all uses of it will cause bugs (so, in fact, they will *be* bugs in the docs). Eddy. -- Stinginess with privileges is kindness in disguise. -- Guide to VAX/VMS Security, Sep. 1984. s/privilege/feature/ -- Eddy, 2001/March/18. From guido@digicool.com Sun Mar 18 17:12:15 2001 From: guido@digicool.com (Guido van Rossum) Date: Sun, 18 Mar 2001 12:12:15 -0500 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Sat, 17 Mar 2001 20:47:14 GMT." References: <008d01c0ac9e$ebb14f10$f05aa8c0@lslp7o.int.lsl.co.uk> <200103170258.VAA14151@cj20424-a.reston1.va.home.com> Message-ID: <200103181712.MAA23125@cj20424-a.reston1.va.home.com> > Thank you Guido. > That makes sense. > Not naughty after all ;^) > > The ExtensionClass module in Zope actually implements class-like > > objects that behave in such a way that at least the first example > > (D.f.foo = 2) changes the f.foo value for class D but not for class C. > > So this is not just theoretical! > see, I knew it'd be the sort of think Jim Fulton plays with. > (OK, `think' was a typo: but it deserves to survive ;^) > Which encourages hope. > > First off, I'm going to disparage your second example: > > class C: > > def f(self): pass > > f.foo = 1 > > > > x = C() > > x.f.foo = 2 # Would also change value of C.f.foo! > > This is strictly another matter: x.f isn't really the same thing as C.f, > it `should' be currie(C.f, x), if you see what I mean, hence a different > object from C.f, so setting its .foo shouldn't affect C.f, even with the > old semantics (even assuming one's allowed to setattr on a bound method, > which sounds *very* dodgy to me). So I would be rude enough to call > this example a bug in pre-2.1b1 python and ask that it be left out of > the discussion ;^> Sure! > But the `D.f is C.f' problem is real. > So, on the one hand, when a method is inherited, > > > class C: > > def f(self): pass > > f.foo = 1 > > > > class D(C): > > pass > > D.f.foo = 2 # Would change value of C.f.foo! > > but, at the same time, if the derived class *does* over-ride the method, > > >>> class A: > ... def f(self): """doc string of A.f""" > ... > >>> class B(A): > ... def f(self): return > ... > >>> B.f.__doc__ # B.f wishes it would `inherit' from A.f, but doesn't > >>> > > Now any attempt at getting the latter desideratum, without the former > naughtiness, is going to be sophisticated: but can it be done ? Clearly > there *is* a sophisticated munging phase when B's namespace, having been > built by execution of B's suite, gets transformed and packaged so that > B.f is no longer simply a function; is it possible, at that juncture, to > arrange that it will borrow __doc__ off A.f ? > (Only if B.f lacks __doc__, naturally.) Hm, this is entering a whole realm of stuff where Python isn't very helpful. Folks who know Eiffel have suggested inheriting pre- and post-conditions. Others, coming from C++, have suggested automatic calling of base class constructors in derived constructors. Now you suggest inheriting docstrings. Maybe there's something there, but it's definitely Python 3000 material... > This would involve some added magic in the type of unbound methods but > that's a pretty magical type *anyway* and it *looks* like it should be > feasible by applying games similar to (though hopefully less complex > than) those used by ExtensionClass. > > A derived class' re-implementation `should' behave like that of the > base, or it abrogates its ADT, Yes, but Python doesn't really try to enforce that (or even help you). > so having the same doc as the method > being over-ridde should be `usual'. Unclear. It depends a lot on what's in the docstring. I have written lots of docstrings that would be really misleading if they were inherited! > The exceptions incur a tiny cost - > they have to supply a doc string, which can be empty if they want, but > really they *should* be explaining why they abrogate the ADT anyway, > given that the base class's other methods might exercise the replacement > - and, without this borrowing, the usual case implies gratuitous > duplication - either of the doc string or of the assignment from base. > The latter is really ugly - I should not have to type __doc__ in any > ordinary piece of code; only in introspectors. > > Note (for anyone who missed it) that Tony discovered one needn't go via > im_func, as long as one assigns *before* B's namespace gets munged: > > >>> class E(A): > ... def f(self): pass > ... f.__doc__ = A.f.__doc__ > ... > >>> E.f.__doc__ > 'doc string of A.f' > > This is because f is still an ordinary function object (in particular, > it isn't yet E.f; E doesn't yet exist) when the assignment happened. f is still an ordinary docstring even after the class definition is complete -- but if you access it in the conventional way (as E.f) it is munged on the way out. E.__dict__['f'] also gives the function. (Not that I encourage using this!) > There is, of course, an obvious problem: multiple inheritance, when more > than one base supplies the over-ridden method. The solution is, of > course, to borrow off the method on the earliest of the bases to provide > it and leave the implementor to over-ride that by assigning if they > must. This will be rare enough not to be an issue; and it will simply > work, because either > > * you assign as above, in which case E.f had a __doc__ before munging, > so borrowing off E's bases' .f didn't get invoked; or > > * you assign after the class body, via im_func, in which case you > over-ride what the munging has done and it still works. > > How's the time machine doing ? > Do methods yet `inherit' __doc__ when not over-ridden ? > > Eddy. For this one, I prefer to use the time machine in the opposite direction. Let's move this set of ideas to a new design for Py3K. (PS, I regret that this is off-topic for doc-sig -- it should really be moved to python-dev.) --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Sun Mar 18 22:21:48 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 18 Mar 2001 23:21:48 +0100 Subject: [Doc-SIG] What counts as a url? References: <200103160432.f2G4WAp25746@gradient.cis.upenn.edu> <3AB24895.8DB07998@lemburg.com> Message-ID: <3AB534FC.98C14B7C@lemburg.com> Edward Welbourne wrote: > > >> ([a-zA-Z0-9-_.!~*'();/?:@&=+$,#] | %[0-9a-fA-F][0-9a-fA-F])+ > > r'\b((?:http|ftp|https|mailto)://[\w@&#-_.!~*();]+\b/?)' > erm ... > > I'm fairly sure you're allowed at most one # and at most one ? in an > URL: any others *must* be url-encoded as %[0-9A-Fa-f]{2} tokens. I'm > fairly sure you aren't allowed an & before the ? and that the # has to > appear after the ? and all & > > Marc's regex doesn't mention = and ? explicitly, but they're definitely > allowed in URLs. Are () really allowed in URLs ? Yes. > How about {} and [] ? No. See the RFC Apendix A for details. > I'm fairly sure : and , are allowed in paths. But I'd expect :,{}()[]*! > all to be url-endoced, anyway, so they shouldn't appear in the regexen; > they're covered by % and \w. > > There is an RFC for URIs, I mailed it to Edward recently; > I guess that'd be > >> and looked up "RFC 2396":http://www.w3.org/Addressing/rfc2396.txt . > so go read the appendices (pedantically). FYI, here's a working reference: http://sunsite.dk/RFC/rfc/rfc2396.html > I know the relevant RFC has a helpful Appendix A giving BNF and Appendix > B advising how to parse, complete with a regex for parsing (which > presumes you *check* separately, based on the BNF). > > I really don't like that space between the URL and the full-stop (sorry, > `period', to translate into North American Anglic); but, no, I can't see > how to avoid it. Other than to treat the end of a URL as `this may have > been the end of a sentence', even if it isn't followed by a . so authors > of doc-strings know they can treat the URL as sentence-end (unconvinced). Note that the RE I mentioned was not supposed to parse all URLs allowed by the different standards out there. The bug you found wasn't intended either, BTW ;-) The RE is basically a very simple approximation of what is allowed and finds most instances of URLs in plain text. > oh - Mark: > > r'\b((?:http|ftp|https|mailto)://[\w@&#-_.!~*();]+\b/?)' > did you really mean `from # to _ inclusive' ^^^ or did you mean to say > `#, - or _' ? Hmm, I think you mean the latter: put - last in the [...] > But the latter reading claims you've missed out / in the path, and the > former claims most entries in your [...] are duplicates of ones in the > #-_ range. I'm confused. It's a bug, just like the omission of "/=?" which was covered up by re.compile() using the whole range #-_ of characters... > If the - is last, and we mention = explicitly, we can phrase the > character class as [...=;-] with its hair standing on end, which seems > entirely appropriate. Good idea ;) Oh and please also add the slash and all other character in #-_ which could be useful in URLs. > . Why do regexes always feel like the right answer to the wrong > question - albethey useful - ? > Edward: will working in EBNF spare us these messes ? > (I'm assuming that's Extended Baccus-Nauer Form, subject to spelling.) Appendix A of the RFC has a "Collected" BNF form -- doesn't look any simpler than the RE, though, only less frightening. > Tibs: would mxTextTools let us say this stuff less uglily ? Not less ugly, but certainly with more certainty as to what passes and what not... > I'm inclined to advise running with the way the RFC's appendices' > approach the problem, though: first, parse according to Appendix B's > regex, then (it explains better than I can here) take the fragments into > which it's cut your putative URL text and check each fragment for > validity according to the appropriate rules in appendix A, which depend > on the scheme; if any fail their check, decide that this wasn't a URL > anyway. Albeit this means fully parsing the URL, so maybe the right > function to add to urlparse is one which reads, from a string, the > longest initial chunk which is a URL, returning a tuple whose first item > is the length, remainder are urlparse.urlparse()'s answers (at least > when the length is positive). Seems overly complicated to me, but if you really care for standards confrom URI recognition then I'd suggest to go ahead and write a patch for urllib which defines a function for finding URLs in text, e.g. findurl(text, start, end) -> (urlstart, urlend) or None. > > I don't think it makes sense to include schemes which are not > > supported by your everyday browser, so only the most common ones > > are included. > I think it does make sense to include them, for two reasons: > > i) we should *recognise that the text is a URL* even when we know not > what to do with it, if only so we can warn the user - the principle > of least surprise says that if you *have to* surprise the user > (whose browser does know about a scheme you're ignoring) you should > at least have the decency to warn. > > ii) forward compatibility - someone may add a scheme that really does > deserve to be in there, and the tool should need minimal revision > to cope. > > and I thought most browsers *did* cope with gopher, which you omit ... > Yes, I admit it, I'm an old fuddy-duddy. > > But the right answer is to use the urlparse module, not to ad-hock > together your own; if you don't like how urlparse does things, fix it. > (Note: I'm as guilty as anyone on this - I'd written a much longer > version of this e-mail, complete with my own od-hack regex, before even > thinking to look for a module, at which point I instantly *knew* the > module was bound to exist - and not be to my liking.) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From edloper@gradient.cis.upenn.edu Mon Mar 19 01:41:47 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 18 Mar 2001 20:41:47 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Sat, 17 Mar 2001 13:50:26 GMT." Message-ID: <200103190141.f2J1flp01199@gradient.cis.upenn.edu> Eddy, re. the issue of inheriting docstrings for methods from base classes, said: >> I'd solidly prefer to require an empty doc-string in a method which >> doesn't want to inherit the doc-string of its parent. I tend to agree; but it seems from Guido's mail like this isn't anything that's likely to happen soon. So I propose the following: 1. For now, do *not* recommend that people use:: f.__doc__ = parent.f.__doc__ 2. For now, recommend that *tools* inherit documentation for a method if f.__doc__ == None, and don't inherit if f.__doc__ = '' or any other string. 3. We can discuss writing a PEP, separate from all the docstring issues we're currently discussing, to handle inheritance of docstrings. 4. If/when such a PEP goes through, the tools could optionally be simplified, but nothing will break (because now the doc strings of the methods in question won't be None anymore). Sound reasonable? -Edward From edloper@gradient.cis.upenn.edu Mon Mar 19 03:19:55 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 18 Mar 2001 22:19:55 EST Subject: [Doc-SIG] backslashing In-Reply-To: Your message of "Sat, 17 Mar 2001 19:41:03 GMT." Message-ID: <200103190319.f2J3Jtp07955@gradient.cis.upenn.edu> Hm.. I'm starting to get convinced that backslashing with backslashes might not be optimal.. So, as a preface to what follows, I think of '...' and #...# as used for in-line literals. I.e., you can include them in sentences. Literal blocks are used for blocks that are separated off from the rest of the code. Thus, there's an important *semantic* difference between '...' and literal blocks. Given that, I don't think it's necessarily resonable to force people to put what really *should* be an in-line literal into a literal block. >> However, I can add to your list of places where it's `needed': \n is >> needed in embedded literals because ... we aren't allowing them to span >> multiple lines, right ? You might mean two things, here. 1. Including the 2-character string '\n' in a literal, intending it to be rendered as a backslash followed by an n. 2. Including an actual newline in a literal, intending it to be rendered as a line break. If we assume that '...' is for in-line literals, (1) makes sense, but (2) really doesn't; if you want line breaks in your literal, you should be using a literal block. If someone wants to discuss a string with a newline in it, they should probably use r"#'\n'#" (which will be rendered as:: '\n' in monospaced font). Or, if we're backslashing things, they should use r"#'\\n'#. >> Someone who wants to include a code fragment including a comment can >> perfectly easily put it into one of the block-style structures for >> enclosing non-plaintext, as I argued when proposing #...# for this role; >> I'm inclined to apply the same ruling to code fragments like:: >> >> script.write('echo HTTP/1.1 200 OK\n# no headers\necho') >> >> which describe python code which uses a # other than as a (python) >> comment character. If you decide you want to let me discuss, inline, a >> shorter cousin of this: I'll point at my uses of \n in it and ask >> whether you really want me to reactivate perverse counterexample mode. I assume that "script.write..." is in an r"..." string, otherwise it would be indistinguishable from:: script.write('echo HTTP/1.1 200 OK # no headers echo') Which would be rendered as such.. In this case, I'd agree, and say to use a literal block. But that's because you wouldn't normally include the string you gave in a sentence.. If we're talking about the string "x'", having to use literal blocks may be unreasonable. Consider the fictional example:: If the user types:: x' then the system should print the value of:: x'(a) and return the value of:: x'(b) This really *should* be rendered as a single sentence, but by forcing the doc writer to put everything in literal blocks, we force it to be rendered with each of those symbols in a separate display area... >> Adding an escape character requires us to make provision for escaping >> the escape (else, as Edward pointed out, we can't *end* a fragment with >> the escape character). At which point the ability of folk to work out >> how many backslashes they're looking at depends not only on counting the >> backslashes they can see, and on working out whether the string is >> r'...' or not, but also on whether they're inside an inline fragment >> right now. This *will* confuse pythoneers. I do agree. But I'm not sure what the best thing to do is. It's a little bit of a problem, *anyway*, because even if we ignore '\', doc writers have to think harder than they should if they want to use backslashes in their docs. :) >> Not even to save vertical space ;^| If it were just an issue of saving vertical space, I'd agree. But I want to make sure that everything reasonable *can* be documented s.t. it will look reasonable when formatted.. I'm less worried about saying "the 0.5% of people using forms like XYZ will have to go to extra trouble." But I may end up agreeing anyway, that the confusion is too much, and that those 0.5% will just have to deal using literal blocks, and possibly with having ugly formatted docs. :) We could also discuss ways of indicating that one-line literal blocks are "really" inlines (::: or some such), but I'm currently loathe to make ST even more complex. :) >> Oblige doc-strings which want to talk about a fragment, using the >> delimiter ST* uses for the relevant kind of inline fragment, to do the >> fragment as a block, not an inline. So are you saying we'd have 2 different kinds of literal blocks? We hadn't really discussed that before.. I think that just having literals, inlines, and literal blocks is probably enough, but if you want to make a case, go ahead. :) -Edward From tony@lsl.co.uk Mon Mar 19 09:51:25 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 19 Mar 2001 09:51:25 -0000 Subject: [Doc-SIG] What counts as a url? In-Reply-To: <3AB24895.8DB07998@lemburg.com> Message-ID: <00d001c0b05a$29bc85e0$f05aa8c0@lslp7o.int.lsl.co.uk> M.-A. Lemburg wrote: > FYI, I use this RE in my apps: > > r'\b((?:http|ftp|https|mailto)://[\w@&#-_.!~*();]+\b/?)' > > I don't think it makes sense to include schemes which are not > supported by your everyday browser, so only the most common ones > are included. Except that I'm paranoid (well, no, really just a worried pedant) and don't like trying to embed a complete list of resource/schemes in the RE - for instance, I've known people who would get upset by the absence of both "news" and "gopher" in the above. And if I were writing a Python library to *handle* a new scheme (for instance, perhaps, for Mozilla?) then I might be upset if I couldn't see it in my docstrings. Tibs (on the other hand, this *is* worth refining over time, and we need not get it *perfect* at the start). -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 19 09:53:35 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 19 Mar 2001 09:53:35 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103161700.f2GH0cp17194@gradient.cis.upenn.edu> Message-ID: <00d101c0b05a$76bb3760$f05aa8c0@lslp7o.int.lsl.co.uk> After a weekend where *some* work got done, significant points: 1. Newlines are preserved again in non-literal paragraphs (Edward Loper convinced me that the benefits outweighed the problems). 2. Newlines are not allowed within literal and Python literal strings. 3. Local references (which look like '[this]' or '[1]') are now supported. The "anchor" for a local reference must be at the start of a paragraph (in future releases I would expect it to *start* a new paragraph if at the start of a line), and looks like:: ..[this] 4. List items and local references may be "empty" paragraphs, but there may still be some unresolved issues with respect to newlines - I'm not sure that:: 1. Some text is allowed (it probably should be, if the form with a blank line between those two lines *is* allowed). 5. The RE used for detecting URLs has become more sophisticated. There are some associated rules - first, "odd" characters (which will be listed in the documentation) must be escaped, either as '&entity;' or as '%xx', and secondly, only a select group of characters may form the *last* character of a URL - essentially, [0-9A-Za-z/], or something like that - this means that "normal punctuation" cannot form the end of a URL (I don't regard these as very common!), and thus 'http://www.fred.jim/.' unambiguously ends a sentence with that full stop, it is not part of the URL. This is a Good Thing. The following are probably mostly in response to Edward Loper: I said that with REs you didn't detect errors > Well, it depends on how you're detecting errors... > > > plain: 'This ' > > emph: 'is "too' > > plain: ' confusing":http://some.url' > > Here, you could say that the string '":' without a matching '"' > is illegal, and raise an error.. That approach is what I meant when I talked about "a long RE for detecting common errors", and it is a sensible approach *if one is validating* - but the results should be warnings, 'cos one of the points of ST, originally, is that users should be able to "push the corners" a bit. > But from the point of view of someone formalizing the language, saying > "there's an ambiguity" is no good. I have to either explicitly say > "it's illegal" (=undefined) or "xyz is the correct answer." Oh, I agree, and it's a good thing to do. But you *do* have a third option, which is the "this behaviour produces undefined results", which is not *quite* the same as "illegal". > p.s., I'm not sure it's safe for us both to be writing email at the > same time. We might overload other peoples' mailboxes. :) Hmm. Of course, it's an attempt at a compromise between a private conversation, and a public dialogue that other people can chip into. Not a very *good* compromise, necessarily... (and damn, folding messages together clearly isn't going to work without spending some serious time on it, so it's back to the cacophony, I'm afraid). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 19 09:54:36 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 19 Mar 2001 09:54:36 -0000 Subject: [Doc-SIG] quoting In-Reply-To: <200103161707.f2GH7Dp17705@gradient.cis.upenn.edu> Message-ID: <00d201c0b05a$9b90fed0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > I think we should add that '\\' is a single backslash and #\\# is too. > Otherwise, there's no way to end a literal with a backslash.. Yes, I guess so. Can you hold that thought to hit me over the head with when I forget to document it in the STpy documentation, please? Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 19 10:27:57 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 19 Mar 2001 10:27:57 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103161758.f2GHwpp20696@gradient.cis.upenn.edu> Message-ID: <00d501c0b05f$443acda0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > Yeah, I've been playing a bit fast and loose with terminology in my > emails.. :) Speaking of terminology, I want to make sure that we're > using somewhat consistant terminology. In particular, I think my > use of the following terms may not coincide with what you call > things. What are your terms for the following? > > * inline = region marked with #hashes#. Python literal string. And something with 'quotes' is a literal string. > * paragraph = a text paragraph; not a list item or a heading or > a label Paragraph (not distinguished normally from the other sorts, which *also* have special names). If I had to distinguish this, I'd probably call it a "paragraph with a blank line before it" (remember, that *might* include the other sorts of thing, too). > * basic block = paragraph or list item or heading or label (or > table?) Paragraph (see above) > * blank line = (S* NL) | (S* EOS) blank line > * literal block = region following a '::'. literal paragraph (which is a *bit* misleading, as it can include blank lines) and a single (non-literal) paragraph starting with '>>>' is a Python paragraph. > * invalid string = string that is not given a meaning by an ST > variant. (in the terms used by the STminus proposal, strings > that are not assigned a structure by a language). I don't have a term for that, because docutils doesn't work like that. I *have* started to generate paragraphs that have a "badpara" tag, so it would be a "badpara" element (I'm following the old-fashioned ST approach of trying to markup what the user said and assuming they meant it, whilst you're trying to do the formal approach - this does leave a gap in talking). > Tibs continued: > > When I am talking, I have some assumptions (which, of > course, may not be > > evident): > > > > 1. by the time discourse occurs, all tabs have gone away > > Agreed. We should probably also discard/transform any whitespace > that isn't space or newline (e.g., form feed, carriage return). Agreed, but something I've ignored for now (unless my code does it without my looking - doutbful). > > 2. blank lines are blank lines - white space in them is ignored > > thrown away (lost for good) > > Is this true in literal blocks? Yes - by the "trailing whitespace is removed" rule. > Also, I'm guessing you collapse multiple consecutive blank lines > into one. Yes, but they get un-collapsed again within literal paragraphs (that's quite important, and a major deficiency in STNG, if it's still not done). (this does not, of course, happen for *Python* literal paragraphs, as they are defined to end at the first blank line - indeed, that (or end of string) is *all* that ends them.) > > 3. trailing whitespace is thrown away > > Trailing whitespace for the string as a whole? For each basic > block? For each line? Is this true in literal blocks? For each line. True in all places (you can't, in general, see them, so there we go). For literal blocks, newlines are preserved, but I can't see any obvious point in preserving trailing spaces. > > 4. literal paragraphs retain leading whitespace following "the > > rules" (which say they are actually indented relative to the > > preceding non-literal paragraph - this makes much more sense > > in ST than "with respect to the left margin"). > > Agreed. Although how do you put something at zero indentation? > Maybe indent from 1 space over from the preceeding paragraph? You don't. I've never wanted to (my problems with HTML normally come from trying to do the opposite). > So we won't use the term whitespace. Instead, we'll use the terms > space, newline, and blank line. Good by me - it also requires one to say "space or newline" when that is what one means. > > Clearly for a string literal that does not contain a > newline, spaces are > > to be transcribed to spaces (probably - flag a rendering issue as to > > whether they're *hard* spaces (the correct number) or *unbreakable* > > spaces (the correct number AND no newlines)). > > I vote for unbreakable, but it may be possible to persuade me. Given I've now forbidded newlines in (both types of) string literal again, I'd also go for unbreakable (my HTML output doesn't implement that, but who cares, it's only a testbed, and could be fixed later on). > > Equally clearly, if one does not allow newlines in string literals, > > that's the end of the matter. We've done our job. > > Which is what I vote for. :) And I now agree. My position wasn't strong enough to stand against nay-saying, I felt. > > > "Here the *name* 'contains' markup":url > > Hm.. I'm confused. So you would get:: > > Here the *name* 'contains' markup > > ? Or:: > > "Here the name contains markup":url Well, personally I'd never emit .. - but I had missed the preceding '::', so was answering on a different assumption, I think... At the moment, markup is not nested. At the moment, literals are scanned for first. So at the moment, a URL text containing a literal string will not be identified as such. In the future, I aim for markup to nest, and then your example would be legal, and do the "right thing". > One other case to consider is:: > > *"I would prefer this":url* to "*this*":url With no markup nesting, and with the current ordering, the first one is emphasised, so not a URL usage, and the second one is a URL, and so not emphasised. Of course, if we're aiming at 2.2, then it is quite possible nested markup might be available by then - I'm just not prepared to "waste" time on it now... > > > "This name spans multiple > > > lines":url > > Revised answer - that's definitely allowed, as newlines are explicitly allowed in the quoted part of a URL definition. Why? Because it's not harmful, it's a bit surprising if they're not (since they're allowed in *other* ".." situations), and I prefer it that way (erm...). > Still seems to me that names should be able to span newlines, though. So I think we're agreeing. > > > "the following is not a url": That's right. In this instance. > Yes, but do we get an error because we used '":' in a silly context > (if we're asking the parser to tell us about errors)? I can't see, in docutils (STminus is another kettle of fish) that error detection (apart from paragraph indentation and paragraph label detection) is other than a bunch of heuristics, almost certainly one or more REs, that point out *possible* problems to a user wanting validation. So it becomes a matter of identifying the set of REs we want to warn about. > > > Do *quotes "have to* nest" properly with coloring? > > But from the point of view of formalizing things, I have two > choices here: > 1. say that it contains a bold region, and the quotes are just > rendered as quotes > 2. say that it's undefined (i.e., an invalid string). Undefined isn't invalid - it's undefined. At least to me, even in a formal context, that's true (i.e., not "I don't know" but "I shan't decide"). On the other hand, once I'm sure I've got the order of markup/colourising correct, I'll be happy to regard it as so, and then you could "freeze" it. But is that a good approach? Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 19 10:35:48 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 19 Mar 2001 10:35:48 -0000 Subject: [Doc-SIG] What counts as a url? In-Reply-To: Message-ID: <00d701c0b060$5cd11850$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Welbourne ("Eddy" for reasons of clarity) wrote: > . Why do regexes always feel like the right answer to the wrong > question - albethey useful - ? Because they are, of course (well, actually, you have it exactly backwards) > Tibs: would mxTextTools let us say this stuff less uglily ? Well, in my opinion, yes, but that's because it's actually a proper parser, so one takes a different approach. Not that I'm volunteering to write it, mind you. > But the right answer is to use the urlparse module, not to ad-hock > together your own; if you don't like how urlparse does things, fix it. > (Note: I'm as guilty as anyone on this - I'd written a much longer > version of this e-mail, complete with my own od-hack regex, > before even thinking to look for a module, at which point I instantly > *knew* the module was bound to exist - and not be to my liking.) There are two problems here: 1. Find the candidate (possible) URL 2. Validate it as such The first is the one we're addressing proximately, and for once I would argue that it is better to find *too many* matches, rather than too few. The second is what Eddy appears to be talking about, with urlparse, etc. It is optional (i.e., one would only do it if validation is selected). It *may* be hard to "unstitch" the markup that has already occurred by the time validation is done, so it is likely to get left until later. Given a big problem, leave it until later... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 19 10:38:10 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 19 Mar 2001 10:38:10 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Message-ID: <00d801c0b060$b19561c0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Welbourne wrote: > Tony, then Edward: > >> On the other hand, sometimes people want that effect (the > >> difficulties of mixing presentation and markup, and not being > >> specificallty a typesetting language - ho hum). > > > Too bad for them. :) If they really need *that much* control over > > their typesetting, they shouldn't be using ST. > > If they're thinking that hard about layout, they aren't concentrating > hard enough on writing API documentation and the result isn't going to > be maintainable ('cos when I next fix a bug in their code, and update > the docs, I'm going to have zero patience with their fancy layout, so > I'm going to normalise it to something straightforward). I agree. I *think* we were talking about indented paragraphs and lists - it may well get simplified. (heh, we all agree - let's take over the world) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 19 10:44:22 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 19 Mar 2001 10:44:22 -0000 Subject: [Doc-SIG] backslashing In-Reply-To: Message-ID: <00d901c0b061$8f556af0$f05aa8c0@lslp7o.int.lsl.co.uk> Discussion by Eddy about problems with escaping quotes using, well, anything, omitted... OK - for the moment, the alpha documentation for STpy.py will hold off on the issue of quoting quotes, and leave it as an unresolved issue for the future (maybe, if I remember, pointing out that literal paragraphs get around the problem, sort of). We can then have a more detailed argument/discussion after that... Tibs (who doesn't have enough time to think hard on this) -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From Edward Welbourne Mon Mar 19 20:30:31 2001 From: Edward Welbourne (Edward Welbourne) Date: Mon, 19 Mar 2001 20:30:31 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103190141.f2J1flp01199@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103190141.f2J1flp01199@gradient.cis.upenn.edu> Message-ID: Edward said: > I tend to agree; but it seems from Guido's mail like this isn't > anything that's likely to happen soon. So I propose the following: a fix in the tools. Perfectly sensible. > 1. For now, do *not* recommend that people use:: > > f.__doc__ = parent.f.__doc__ s/For now, d/D/ Mayhap tell folk this is something they *can do* if they want to duplicate a docstring, but folk should never need to do it this way ... > 2. For now, recommend that *tools* inherit documentation for a > method if f.__doc__ == None, and don't inherit if > f.__doc__ = '' or any other string. I know I'm about to vary my tune but ... someone else has been talking persuasively out-of-band. Rather than borrowing the doc directly off the parent ... If f.__doc__ is None, it would make sense to provide a `default docstring' comprising a `See also' section (with `also' omitted for obvious reasons) referencing the corresponding methods in those of our bases that provide the given method. Optionally, if these methods are themselves using the default, shortcut to the ancestors which *do* provide something useful, though this would be extra work the user could save you by following a few links. However, if you do it this way, the correct rules for when Z.f's default doc refers to A.f's doc are: * A defines f and A.f.__doc__ is not None * there is a chain [A,...,Z] of classes in which each chain[i] is in chain[1+i].__bases__ (sorry about the fiddliness here, but Z may inherit from A via several chains) * no class in the chain (other than A) defines .f and provides a non-None .f.__doc__ But I'd argue for the simpler approach: just link to all bases which provide the given method and leave *their* default pages to provide chains of links to chase. This would have the advantage that you'd record the inheritance tree via which the given method could have been provided if it hadn't been over-ridden. Hmm. Indeed, I'd argue that *every* method's (auto-generated) documentation has a proper section which refers to *all* methods, on bases, with the same name, regardless of whether those methods have docstrings. One is apt to need to know. It is, however, a bit fiddly to determine which bases (here I mean the classes in the tree obtained by chasing __bases__ repeatedly) are providing the given method without it being overridden on the way, i.e. given Z, f, those A for which: * A defines f * there is a chain [A,...,Z] of classes in which each chain[i] is in chain[1+i].__bases__, as before * no class in the chain (other than A) defines .f The test for `C defines .f' is: 'f' in dir(C), by the way. For each base B of a class C to be checked: * if #B.f# raises AttributeError, we can ignore B * else if #'f' in dir(B)# [and B.f.__doc__ is not None], B.f is interesting * else we need to check B where the [and ... None] clause is only involved if you want to do the short-cut. Hmm. Maybe not that fiddly. Code follows (for both alternatives, nearly identical). It isn't noticably harder to do the job right going straight to the ones with docs than going to the ones with the method but no doc; however, I'm inclined to argue for going via all defined intermediaries, even without docs. Eddy. def baseswithmeth(meth, klaz): todo, result, seen = [ klaz ], [], [] while todo: c, todo = todo[0], todo[1:] seen.append(c) try: getattr(c, meth).__doc__ # the .__doc__ filters out classes with a non-method called meth except AttributeError: pass else: if meth in dir(c): result.append(c) else: for base in c.__bases__: if base not in todo and base not in seen: todo.append(base) return result def baseswithmethdoc(meth, klaz): todo, result, seen = [ klaz ], [], [] while todo: c, todo = todo[0], todo[1:] seen.append(c) try: doc = getattr(c, meth).__doc__ except AttributeError: pass else: if doc is not None and meth in dir(c): result.append(c) else: for base in c.__bases__: if base not in todo and base not in seen: todo.append(base) return result # Now do you see what Tibs meant about vertical space ? # He maintains code I left behind when I changed jobs ... From edloper@gradient.cis.upenn.edu Tue Mar 20 20:40:04 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Tue, 20 Mar 2001 15:40:04 EST Subject: [Doc-SIG] URLs Message-ID: <200103202040.f2KKe4p15488@gradient.cis.upenn.edu> Since [] are not allowed in URLs, and we already have expressions of the form "name":[ref], how does the following sound: Use "name":[ref] for in-line hrefs. If ref is a single token, and there is a directive of the form: ..[ref] url Then use url as the URL; otherwise, use ref as the URL. Of course, we'd want to talk to the STNG people about this, but it seems to solve a number of problems: 1. detecting the end of the URL is trivial 2. "name":url. is no longer ambiguous, because we would either say "name":[url]. or "name":[url.] 3. It seems easier to read 4. '":[' is much less likely to occur unintentionally in text than '":' is. So we don't have to worry about people saying things like "foo":bar by accident, instead of intending an href. If we can agree upon it among ourselves, then I think we should start trying to convince the STNG people.. -Edward From edloper@gradient.cis.upenn.edu Tue Mar 20 20:44:18 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Tue, 20 Mar 2001 15:44:18 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Sat, 17 Mar 2001 18:06:55 GMT." Message-ID: <200103202044.f2KKiIp15751@gradient.cis.upenn.edu> >> Have we >> considered the classic spec for labels to appear left of a colon, namely >> RFC 822 (e-mail headers) and its kin ? I think that basically comes >> down to r'\w+(-\w+)*' as regex, generally specified >> [...] Fine with me. >> We might want >> to allow _ as well as \w (indeed, we might want to define \w to include >> _ given that python effectively does so). I asssumed we were treating '\w' as it's defined in the re module, in which case it already does include '_': >>> re.match('\w', '_') Basically re defines '\w' = '[0-9a-zA-Z_] -Edward From Edward Welbourne Tue Mar 20 23:23:33 2001 From: Edward Welbourne (Edward Welbourne) Date: Tue, 20 Mar 2001 23:23:33 +0000 (GMT) Subject: [Doc-SIG] backslashing In-Reply-To: <200103190319.f2J3Jtp07955@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103190319.f2J3Jtp07955@gradient.cis.upenn.edu> Message-ID: > So are you saying we'd have 2 different kinds of literal blocks? We > hadn't really discussed that before.. I think that just having > literals, inlines, and literal blocks is probably enough, but if you > want to make a case, go ahead. :) OK, two orthogonal questions about a verbatim fragment: * inline or block * python code or `alien text' giving us four kinds of `verbatim' fragment in doc-strings. As I've been understanding you, '...' is an inline alien and #...# is an inline python expression. I've been presuming that there are also mechanisms for including (and distinguishing between) blocks whose contents are python or alien in like manner; however, I grant that I've only seen the '::' marker (unless >>> on each line is the markup for python, which I won't like, given that it's on each line) used in such a role, and don't know whether you meant the block it introduces to be read as alien or python. If you don't provide for this distinction, I'm worried. The python/alien distinction is important, because a python fragment is worth the renderer scanning for identifiers it knows about, so may wish to render as xrefs to pertinent documentation (however, this is the *only* processing the renderer should be doing to it); an alien fragment *should not* be so scanned; it is `truly verbatim' and any similarity between sub-texts of it and anything the doc-system knows about *should* be presumed to be fortuitous - otherwise, we have to put in mechanisms for enabling the author of the doc-string to, somehow, indicate `no, really, this *is not* a use of the python identifier which happens to be spelt the same as it' in a verbatim text, which must abrogate its verbatimness. While I recognise that the inline/block distinction would ideally pander to one's natural desire to have the text flow nicely, I consider this a layout issue, not a markup one. [Further: I want to type your fictional example as:: If the user types:: x' then the system should print the value of:: x'(a) and return the value of:: x'(b) without the blank lines you inserted in it; the '::' on the end of each text line, and the return to its indentation at the next, should suffice to enable parsers to know what I meant. Note that I am treating the given fragments as alien verbatim; the user is clearly typing at a prompt which is not a python prompt; and x'(a), x'(b) are presumably reading x' as `the derivative' of some entity named by x.] I regard layout-control as a luxury, subordinate to keeping the markup language simple. I want to be sure that if something appears in a verbatim fragment, at least when I'm inside an r"""...""" string, one can cut-and-paste the fragment into whatever alien context it belongs in and have it be exactly the right thing; and the formatted output of the given raw string should display the relevant fragment verbatim. This seems more important than being able to inline the tiny proportion (namely the cases using the inline-delimiter) of the uses one has for fragments. > I think of '...' and #...# as used for in-line literals. I.e., you > can include them in sentences. Literal blocks are used for blocks > that are separated off from the rest of the code. (final word, `code': I presume you meant `text'). To me, this is a layout distinction, not a semantic one. Consequently, > I don't think it's ... resonable to force people to put what really > *should* be an in-line literal into a literal block. I don't see how `should' can ever be real here. As I understand it, at worst one has to put up with `oh dear, this is going to be ugly, ho hum' rather than `this is going to mean something different'. Obviously, block means something different *to the doc tools* than inline means; but the only meaning I care about is the information content the reader gets hold of at the end. > even if we ignore '\', doc writers have to think harder than they > should if they want to use backslashes in their docs. :) As long as they use r'...', and don't want to end their string in an odd number of backslashes, I see no problem: please give an example. Raw strings are either invalid or read exactly the way they appear. No thought is required. >>> r'\' File "", line 1 r'\' ^ SyntaxError: invalid token >>> so one cannot end a raw string in a single backslash; but >>> r'\\' '\\\\' >>> r'\'' "\\'" >>> r""" \" """ ' \\" ' >>> r""" \' """ " \\' " >>> r'''\'''' "\\'" >>> r'\n' '\\n' >>> anything it doesn't reject, it preserves faithfully. >>> Not even to save vertical space ;^| > If it were just an issue of saving vertical space, I'd agree. sorry, I didn't make myself clear. Having to add vertical space in these cases is going to annoy *me*, despite which I would rather endure this annoyance than have the entire inline mechanism be (IMO) broken. In the relevant cases, the resulting document, once rendered, will split things into several paragraphs separated by displays, where I would, indeed, prefer to have said the same thing in a single paragraph; this *will* annoy me and will clearly annoy other folk more than the extra vertical space in the source file, but please understand that for me it's the other way round and *despite that* I'm arguing for you to oblige me to break up the text into blocks. Even if you *do* insist on me putting back the blank lines I took out of your fictitious example. (Though I'll be objecting to that independently.) > We could also discuss ways of indicating that one-line literal blocks > are "really" inlines (::: or some such), but I'm currently loathe to > make ST even more complex. :) the complexity argument is *exactly* the one I'm focussed on. Out-of-source documentation should provide the means for folk to say things the way they want and have it displayed the way they want: for in-code ST docs, simplicity of markup language is *more important* than making it look nice. It suffices that we ensure that the author can express the information the reader needs. A few rare cases where this will merely be ugly are not worth extra complexity. > I assume that "script.write..." is in an r"..." string, that was, indeed, my intent; given which, > You might mean two things, here. only meaning 1 is, to my mind, a credible candidate. Meaning 2 doesn't read my text verbatim. > you wouldn't normally include the string you gave in a sentence. OK, so I was trying to be realistic, which made my text long. How about (from the docs of an imaginary ftp.py):: the send method executes:: sock.control.write("#") after every #chunk# bytes of data have been written to #sock.out# in which I would normally have wanted to inline the code fragment, but the presence of a # in it conflicts with the #...# delimiter. One could make up shorter examples; but, at least in the present case, one can side-step the problem:: the send method writes one '#' character to #sock.control# for every #chunk# bytes it writes to #sock.out#. albeit I may be marking more things with #...# than I need to (am I ?). Indeed, in the small proportion of situations where I can realistically believe in needing to escape the delimiter of an inline fragment, I am inclined to suggest the author think a bit about whether there isn't some other way of phrasing the text so as to avoid the problem. [It's a bit like how `politically correct' folk used to spend time and effort trying to get us all to settle on a gender-neutral non-neuter pronoun, but grown-up folk have simply learned to side-step the problem - partly by reviving the pronoun `one', partly by avoiding constructions which oblige us to use pronouns in the places where Anglic's provision of them is unfaithful to the author's intent.] If no such rephrasing is possible, the worst we impose on them is that they have to break out into a block structure; which won't *look* right, but will none the less express the information they intended to express. Now, with alien verbatim (as opposed to python verbatim), I realise there is a problem; alien text can have absolutely anything in it, so alien fragments using the delimiter can't be relied on to be rare. Yet, if we're to serve it up verbatim, we should serve it up verbatim. If we can't do it *verbatim*, at least inside `raw' strings, then we should scrap the verbatim inline mechanism. The proposed `fix' breaks its verbatimness, i.e. fixes one thing while breaking another. That's not an acceptable fix. Eddy. From Edward Welbourne Wed Mar 21 01:10:23 2001 From: Edward Welbourne (Edward Welbourne) Date: Wed, 21 Mar 2001 01:10:23 +0000 (GMT) Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <00d101c0b05a$76bb3760$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <00d101c0b05a$76bb3760$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: > 5. The RE used for detecting URLs has become more sophisticated. There > are some associated rules - first, "odd" characters (which will be > listed in the documentation) must be escaped, either as '&entity;' or as > '%xx', and secondly, only a select group of characters may form the I do not believe that &entity; has any place in an URL. It's a purely SGML/HTML beast, nothing to do with HTTP. The rest of 1--5 looks good. Eddy. From Edward Welbourne Wed Mar 21 01:02:36 2001 From: Edward Welbourne (Edward Welbourne) Date: Wed, 21 Mar 2001 01:02:36 +0000 (GMT) Subject: [Doc-SIG] quoting In-Reply-To: <00d201c0b05a$9b90fed0$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <00d201c0b05a$9b90fed0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: >> I think we should add that '\\' is a single backslash and #\\# is too. >> Otherwise, there's no way to end a literal with a backslash.. > Yes, I guess so. Can you hold that thought to hit me over the head with > when I forget to document it in the STpy documentation, please? r""" ... '\\' ... """ contains a verbatim literal with two backslashes in it. There is no way to end a raw string with a single backslash either; but it turns out not to be such a huge problem, after all; and, if '\\' is to be only one backslash, one gets to revisit the entire nightmare of ... users wanting to know why r""" ... '\n' ... """ contains a two-character literal fragment, \n, while r""" ... '\\\\\\' ... """ contains a three-character literal fragment but r""" ... '\\\n\\' ... """ contains a four-character literal fragment and ... and I'm too tired to go into this, but it's a nightmare waiting to bite. Eddy. From mal@lemburg.com Wed Mar 21 10:19:20 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 21 Mar 2001 11:19:20 +0100 Subject: [Doc-SIG] URLs References: <200103202040.f2KKe4p15488@gradient.cis.upenn.edu> Message-ID: <3AB88028.605B88A6@lemburg.com> "Edward D. Loper" wrote: > > Since [] are not allowed in URLs, and we already have expressions of > the form "name":[ref], how does the following sound: > > Use "name":[ref] for in-line hrefs. If ref is a single token, and > there is a directive of the form: > > ..[ref] url > > Then use url as the URL; otherwise, use ref as the URL. > > Of course, we'd want to talk to the STNG people about this, but it > seems to solve a number of problems: > 1. detecting the end of the URL is trivial > 2. "name":url. is no longer ambiguous, because we would either say > "name":[url]. or "name":[url.] > 3. It seems easier to read > 4. '":[' is much less likely to occur unintentionally in text than > '":' is. So we don't have to worry about people saying things > like "foo":bar by accident, instead of intending an href. > > If we can agree upon it among ourselves, then I think we should start > trying to convince the STNG people.. Sounds like a good idea, but don't you use angular brackets ? These are recommended by the URI RFC, in wide use everywhere and have similar properties... -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From tony@lsl.co.uk Wed Mar 21 10:32:36 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 21 Mar 2001 10:32:36 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103202044.f2KKiIp15751@gradient.cis.upenn.edu> Message-ID: <001c01c0b1f2$3ee60d80$f05aa8c0@lslp7o.int.lsl.co.uk> > >> Have we > >> considered the classic spec for labels to appear left of a > colon, namely > >> RFC 822 (e-mail headers) and its kin ? I think that > basically comes > >> down to r'\w+(-\w+)*' as regex, generally specified > >> [...] > > Fine with me. I'm assuming we're talking about paragraph labels. I think we should just go with the English definition of a word, which means [-A-Za-z], and leave it at that. It is *meant* to look like a word. Just because there is a colon there doesn't mean it is related to other fields that happen to end with a colon. The current default labels are:: label_dict = {"Arguments":"arguments", "Author":"author", "Authors":"author", "Dedication":"dedication", "History":"history", "Raises":"raises", "References":"references", "Returns":"returns", "Version":"version", } If one is translating (slightly modified format) PEPs, then one would instead use:: builder.label_dict = {"PEP":"pep", "Title":"title", "Version":"version", "Author":"author", "Status":"status", "Type":"type", "Created":"created", "Post-History":"post-history", "Discussions-To":"discussions-to", } I think "keep it simple" is required here - these labels are meant to be few and simple, so English words seems sensible to me. I would thus vote against underlines and against digits. Also, validation aside, I don't *use* a regular expression - I look for the right "shape" of paragraph (1 line, colon in it) and check what is to the left of the colon against the dictionary. From *my* point of view the legitimate characters idea only comes in with a validation phase (of course, it would be different for Edward). > Basically re defines '\w' = '[0-9a-zA-Z_] Erm - basically it doesn't - it invokes "locales" which makes life more complex (and I have no idea what sre does about '\w'). From tony@lsl.co.uk Wed Mar 21 10:45:10 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 21 Mar 2001 10:45:10 -0000 Subject: [Doc-SIG] backslashing In-Reply-To: Message-ID: <001d01c0b1f4$00d24250$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Welbourne wrote: > OK, two orthogonal questions about a verbatim fragment: > * inline or block > * python code or `alien text' > giving us four kinds of `verbatim' fragment in doc-strings. > > As I've been understanding you, '...' is an inline alien and > #...# is an inline python expression. Correct. There are (module spanish inquisition) reasons for having the two forms: 1. Python literals commonly include a single quote, but rarely include comments (this was the whole basis of your suggesting this notation, Eddy!). Trying to use single quotes to indicate Python literals would be a right pain. 2. I suspect that it *may* be useful to regard all Python code fragments that contain a single Python entity (be it name or function call) as potential "local links" - i.e., generate a reference from them. I know that I didn't like Ka-Ping Yee's approach of aggressively looking for all *potential* links in unmarked-up text, but once someone has indicated that something is, indeed, Python code, I feel this is a risk I'm more prepared to take. > I've been presuming that there are also > mechanisms for including (and distinguishing between) blocks whose > contents are python or alien in like manner; however, I grant > that I've only seen the '::' marker (unless >>> on each line is > the markup for python, which I won't like, given that it's on each > line) used in such a > role, and don't know whether you meant the block it introduces to be > read as alien or python. If you don't provide for this > distinction, I'm > worried. No. '::' introduces a literal "block", whose contents are not parsed. The contents may include blank lines. '>>>' at the start of a paragraph indicates that *that paragraph* is Python code. Such a paragraph ends at the next blank line (or end of file!). This is intended to allow the visual distinction of text that will be specially handled by doctest (which is now a formal part of the Python package, and whose use is To Be Encouraged (my opinion)). They are thus serving a different purpose. > [Further: I want to type > your fictional > example as:: > If the user types:: > x' ...etc... > without the blank lines you inserted in it; the '::' on the > end of each > text line, and the return to its indentation at the next, > should suffice > to enable parsers to know what I meant. The whole issue of *when* we start new paragraphs is likely to be a major research issue after docutils 1.0, I suspect - we already know how easy it is to boobytrap ourselves. I'll leave the rest - I have a feeling that: a. You're arguing towards agreement (indeed, you may already agree) b. I'm not going to introduce any form of quoting in STpy alpha1 c. Probably not in STpy 1.0, either Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Wed Mar 21 10:52:02 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 21 Mar 2001 10:52:02 -0000 Subject: [Doc-SIG] URLs In-Reply-To: <200103202040.f2KKe4p15488@gradient.cis.upenn.edu> Message-ID: <001e01c0b1f4$f64ad5d0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > Since [] are not allowed in URLs, and we already have expressions of > the form "name":[ref], how does the following sound: > > Use "name":[ref] for in-line hrefs. If ref is a single token, and > there is a directive of the form: > > ..[ref] url > > Then use url as the URL; otherwise, use ref as the URL. Inline refs were introduced deliberately to look like footnotes - that is, the rendering of the '[..]' is *meant* to be identical (so [fred] in STpy text should look like [fred] in the final HTML, TeX or whatever, module underlining and other indicators). Requiring *inline* refs to have funny quoted text in front of them would reduce their usefulness. So far as I'm concerned, they make sense, I've implemented them in docutils (trivial) and I *like* them (the only problem with them is the introductory '..' on the anchors, and I can't see a way around that that's simple). They match a convention people already use. 'nuff said. As a separate issue, for *non* inline references, it would indeed be quite nice if we could delimit all URLs in some way (using '<' and '>' would actually be a lot more traditional). But I think this is way too late on the "compatibility with all other forms of ST" basis - i.e., this would be a big break with the past. *But* raise it over on the STNG side. If they went for requiring:: "some text": "more text": "more more", instead of:: "some text":http://some.url/ "more text":fred.html#label "more more", http://fred.label http://www.bare.url *then* I would go for it (but only then). It would indeed make life a lot simpler. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Wed Mar 21 16:48:36 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 11:48:36 EST Subject: [Doc-SIG] Re: ping Ping In-Reply-To: Your message of "Sat, 17 Mar 2001 12:30:05 PST." Message-ID: <200103211648.f2LGmbp00795@gradient.cis.upenn.edu> > My general feeling about all of the syntax ideas that have been going > back and forth is that i'm a little afraid of their complexity. When > i have a moment i'll try to get a handle on what rules are currently > on the table and see how many there are, but i'll definitely want to > keep them minimal. I tend to agree about ST being too complex in some ways. I'm currently working on 2 separate PEPs, one that includes all the features that Tibs wants, and one that has only a more limited set. I think that we may have an easier time selling a simpler proposal to the Python community, but I may be wrong.. We'll wait until all the PEPs are out on the table, and discuss them, I guess. > > > ah. So: Edward, Tibs and I, who have done this week's talking, all > > agree on a position which puts `API docs' into the source file and puts > > tutorials, reference docs, etc. elsewhere. > > I will maintain the opposing position for now, as devil's advocate. Good. We may need more devil's advocates as we get closer to agreement, because we'll need to be able to answer these questions when they come up on python.dev. :) > - Just to be clear, the suggestion on the table is only to > move the library ref manual into the modules, not the > language reference or anything like that. Only the ref manual, or also the howtos, tutorials, etc? And if only the ref manuals, what do we currently think belongs in the ref manuals, other than API docs? > - If documentation for a module doesn't live in the module > itself, how will a user find it? I think that this is a question that we'd have to answer, whether we put more docs in the module or not -- how does the user find the tutorials, howtos, etc., that were *not* written by the programmer/maintainer? Currently, www.python.org does pretty well at this, but we may want to set up a more principled system.. But in any case, I think this is a separate issue.. > - Keeping modules and associated docs in the same file helps > to ensure that the two are in sync when you distribute or > edit the file. (It's not possible to have different > versions of the code and the docs at the same time; it's > less likely that someone will check in changes to one > without updating the other, etc.) 2 issues: editing and distribution distribution -- maybe we want to turn modules into packages, and include docs in the package? There's not a lot of precedent for this in other languages though.. editing -- I think that keeping modules & docs in the same file will help keep docs in sync with modules *if* we're talking about what has been called "point-documentation"... But I don't think it'll help for howtos, tutorials, etc. It's unreasonable to edit the tutorial every time you change the code. -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 17:02:02 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 12:02:02 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Mon, 19 Mar 2001 09:53:35 GMT." <00d101c0b05a$76bb3760$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103211702.f2LH23p02270@gradient.cis.upenn.edu> > 1. Newlines are preserved again in non-literal paragraphs (Edward Loper > convinced me that the benefits outweighed the problems). > 2. Newlines are not allowed within literal and Python literal strings. Yay! I'll code that up in STminus002 as soon as I get a chance. (I should be done with STminus002 relatively soon). > 3. Local references (which look like '[this]' or '[1]') are now > supported. The "anchor" for a local reference must be at the start of a > paragraph (in future releases I would expect it to *start* a new > paragraph if at the start of a line), and looks like:: > > ..[this] So... are anchors always hrefs? Or can they be generic footnotes? Or references for a references section? How should we deal with these when we're using something other than HTML (e.g., LaTeX) to render the string? If anchors can be footnotes or references, how does the renderer decide what to do with them? > 4. List items and local references may be "empty" paragraphs, but there > may still be some unresolved issues with respect to newlines - I'm not > sure that:: > > 1. > Some text > > is allowed (it probably should be, if the form with a blank line between > those two lines *is* allowed). I'll add this too. BTW, how are you currently handling things like this:: 1. some text some more text Is that a list item with 2 paragraphs, or a list item with some contents and 1 subparagraph, etc? I.e., how would it get rendered in whatever XML-like thing you're using? > 5. The RE used for detecting URLs has become more sophisticated. There > are some associated rules Hm.. I don't look forward to formalizing this, and trying to get STNG to agree with your regexps :) > That approach is what I meant when I talked about "a long RE for > detecting common errors", and it is a sensible approach *if one is > validating* - but the results should be warnings, 'cos one of the points > of ST, originally, is that users should be able to "push the corners" a > bit. Or errors, if the user asks for them to be errors. :) Note also that it should be possible to generate the "long RE expression" in a *principled* way, given a formalization, so that it will detect *all* errors, not just *common* errors. > > But from the point of view of someone formalizing the language, saying > > "there's an ambiguity" is no good. I have to either explicitly say > > "it's illegal" (=undefined) or "xyz is the correct answer." > > Oh, I agree, and it's a good thing to do. But you *do* have a third > option, which is the "this behaviour produces undefined results", which > is not *quite* the same as "illegal". Ok, in the formalization system I set up, I divided everything into "valid" and "undefined". I see a good argument for further dividing "undefined," though.. So I'll redefine my terms, as such: valid -- The string has a unique, predictable result. this is the same result that it will have in all future versions. invalid -- The string does not have a unique, predictable result illegal -- The string will never have a unique, predictable result undefined -- The string does not currently have a unique, predictable result, but it may in a future version. Is that acceptable terminology? (I'll try to remember to stick to it) -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 17:09:13 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 12:09:13 EST Subject: [Doc-SIG] Local References Message-ID: <200103211709.f2LH9Dp02661@gradient.cis.upenn.edu> > 3. Local references (which look like '[this]' or '[1]') are now > supported. The "anchor" for a local reference must be at the start of a > paragraph (in future releases I would expect it to *start* a new > paragraph if at the start of a line), and looks like:: > > ..[this] Clarification on the syntax.. is *anything* that looks like [this] a local reference, or does it have to be preceeded by "a parenthetical like"[this] or "a parenthetical and a colon like":[this]? If one of the latter, does [this] get rendered with brackets? Flagged as a warning when validating (in principle, not in current implementation)? What happens if the referent is missing? What is acceptable content for [this]? '[\w_-]+'? -Edward From ping@lfw.org Wed Mar 21 17:14:56 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 21 Mar 2001 09:14:56 -0800 (PST) Subject: [Doc-SIG] Re: ping Ping In-Reply-To: <200103211648.f2LGmbp00795@gradient.cis.upenn.edu> Message-ID: On Wed, 21 Mar 2001, Edward D. Loper wrote: > > - Just to be clear, the suggestion on the table is only to > > move the library ref manual into the modules, not the > > language reference or anything like that. > > Only the ref manual, or also the howtos, tutorials, etc? And if > only the ref manuals, what do we currently think belongs in the > ref manuals, other than API docs? What i had in mind was "how to use this module". Let's look at some examples for clarification. As an extreme example, try running "perldoc CGI". CGI.pm contains about 3200 lines of code followed by 3000 lines of detailed documentation. While the module itself is indeed enormous, i think that it is useful to have all of that information about how to use the CGI module instantly available right there in CGI.pm. A more reasonable arrangement would be to split the CGI functionality into several modules, and move the relevant parts of the docs acoordingly. For instance, CGI.pm currently tries to do the work of both cgi.py and HTMLgen. But *if* CGI is going to do all that, it should all be documented in CGI.pm. Look at the sections in these examples and give an opinion on what belongs or doesn't belong with the source code: perldoc CGI perldoc CPAN perldoc CGI::Cookie perldoc Data::Dumper perldoc Getopt::Long perldoc Net::Ping perldoc overload > > - If documentation for a module doesn't live in the module > > itself, how will a user find it? > > I think that this is a question that we'd have to answer, whether > we put more docs in the module or not I think it's relevant. The question is "how far is the user of a module from *some* information on how to use the module?" Doesn't matter if they don't have every article that anyone has ever written about the module -- do they have a starting point? > editing -- I think that keeping modules & docs in the same file > will help keep docs in sync with modules *if* we're talking > about what has been called "point-documentation"... But I > don't think it'll help for howtos, tutorials, etc. It's > unreasonable to edit the tutorial every time you change the > code. If changing the code changes the behaviour of the module so that your examples don't work any more, then yeah, you'd better edit the examples. (Scenario: i've changed a method name from foo() to spam(), so in my editor i search for ".foo(" to do replacement...) It's also harder for me to change foo() to spam() in just the code, check in just that part, and say "oh, i'll change the docs later" -- because i'll be checking in a single file that's inconsistent with itself. -- ?!ng Happiness comes more from loving than being loved; and often when our affection seems wounded it is is only our vanity bleeding. To love, and to be hurt often, and to love again--this is the brave and happy life. -- J. E. Buchrose From edloper@gradient.cis.upenn.edu Wed Mar 21 17:19:44 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 12:19:44 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Mon, 19 Mar 2001 10:06:36 GMT." <00d401c0b05c$488cca00$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103211719.f2LHJjp03408@gradient.cis.upenn.edu> > Well, prepare to be well miffed (ST has never supported differing > starting and ending quotes). So hey. Although now we have [...] (or "..."[...] or "...":[...] or whatever it really is). > It should (eventually) in the '"..."' text, but not in the URL itself. > This is actually a good reason to forbid apostrophe in URLs, Of course, you can't reasonably forbid '#' in URLs, so you'll have to put URL recognition before inline recognition *anyway*.. :) > and may > mean I need to put the URL recognition *before* literal recognition - > no, that won't work, 'cos then I couldn't say > > 'http://www.literal.org/' > > Hmm. This is a no-win situation, I'm afraid. Ah - no it's not, because > I'm requiring the user to escape spaces in a URL, and not to end with > "funny" characters, so it *should* actually come out in the wash - we'll > need to make some careful test cases... I think there's a serious problem here if we are allowing URLs to appear in arbitrary places. For example, consider:: foo://no#good bar://parse#for this. It seems perfectly reasonable for #good bar...# to be a literal.. But then it also seems reasonable for those to be urls.. Possible ways out: 1. Say that the opening '#' must have whitespace to its left, and the closing '#' must have whitespace to its right. Of course, that forbids saying things like #Object#s, but I guess I could live with that 2. Use some special demarkation for URLs! :) I'm for this, but am worried about trying to convince the STNG people, esp. if we're proposing using <..>.. Since they're currently saying that such things should be ignored. Of course, they're clearly wrong on that point, too, but it means that I'll have to argue 2 different points at once. :) Also, if we do this, we have to be sure to stress in the PEP/ST docs that math must go in literals like: 'x*y>z'. (Of course, we'll probably want to stress that anyway). Are there any objections in principle for using <...> to delimit URLs? (Other than that it will be hard to convince STNG people). If not, I think we should start trying to convince STNG people to use <...> for URLs, and to give up on ignoring <...> tokens. -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 17:48:02 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 12:48:02 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Mon, 19 Mar 2001 10:27:57 GMT." <00d501c0b05f$443acda0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103211748.f2LHm3p05195@gradient.cis.upenn.edu> > Paragraph (not distinguished normally from the other sorts, which *also* > have special names). If I had to distinguish this, I'd probably call it > a "paragraph with a blank line before it" (remember, that *might* > include the other sorts of thing, too). I think it may be useful to distinguish this (and "paragraph with a blank line before it" is definitely *not* what you want, since it leaves out what I would call a paragraph at the beginning of a document, and it could potentially include any other basic block, if it happens to have a blank line before it (which is required for headings, etc.)). > > * basic block = paragraph or list item or heading or label (or > > table?) > > Paragraph (see above) I think this is somewhat misleading/confusing.. But I guess that's up to you to decide.. > > > 3. trailing whitespace is thrown away > > > > Trailing whitespace for the string as a whole? For each basic > > block? For each line? Is this true in literal blocks? > > For each line. True in all places (you can't, in general, see them, so > there we go). > > For literal blocks, newlines are preserved, but I can't see any obvious > point in preserving trailing spaces. I guess that seems reasonable. Within paragraphs, do you collapse multiple spaces into one space? > > Agreed. Although how do you put something at zero indentation? > > Maybe indent from 1 space over from the preceeding paragraph? > > You don't. I've never wanted to (my problems with HTML normally come > from trying to do the opposite). Hm.. I'm not sure I agree with this, but I don't think it's important enough to get hung up on. (I would argue that you should be able to put things in column 0, but that the HTML renderer should probably indent preformatted regions relative to everything else). > > > > "the following is not a url": > > That's right. In this instance. So does it get rendered as is (i.e., with two quote signs, one colon sign, a less than sign, and a greater than sign)? > I can't see, in docutils (STminus is another kettle of fish) that error > detection (apart from paragraph indentation and paragraph label > detection) is other than a bunch of heuristics, almost certainly one or > more REs, that point out *possible* problems to a user wanting > validation. So it becomes a matter of identifying the set of REs we want > to warn about. As I (think I) said earlier, it should be possible to do error detection in a principled way, given a formal definition of ST. We should be able to print out *all* problems, not just *possible* problems, if the user really wants us to. This seems very important to me if we want to allow for the possibility of competing implementations of ST. > > > > Do *quotes "have to* nest" properly with coloring? > > > > But from the point of view of formalizing things, I have two > > choices here: > > 1. say that it contains a bold region, and the quotes are just > > rendered as quotes > > 2. say that it's undefined (i.e., an invalid string). > > Undefined isn't invalid - it's undefined. At least to me, even in a > formal context, that's true (i.e., not "I don't know" but "I shan't > decide"). I'm calling undefined a subset of invalid. (invalid=illegal+undefined). > On the other hand, once I'm sure I've got the order of > markup/colourising correct, I'll be happy to regard it as so, and then > you could "freeze" it. But is that a good approach? The markup-nesting problem doesn't actually seem that difficult to me, in principle. I propose that we allow anything to nest within anything, with the restrictions: 1. nothing can nest inside a literal, inline, or href url 2. nothing can nest within itself (even with intervening levels) So the legal nestings are shown in this tree: * literal * inline * emph * literal * inline * strong * literal * inline * href name * inline * literal * href url * href name * strong * literal * inline * href url * strong * literal * inline * emph * literal * inline * href name * inline * literal * href url * href name * emph * literal * inline * href url * href name * literal * inline * strong * literal * inline * emph * inline * literal * href name * emph * literal * inline * href url Also, spaces must come between * and ** delimiters, so you can't say ***this***. (Footnote markers [like_this] would probably pattern like href urls) -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 18:35:24 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 13:35:24 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Mon, 19 Mar 2001 20:30:31 GMT." Message-ID: <200103211835.f2LIZOp09967@gradient.cis.upenn.edu> [On making docs deal with inheritance] > > 2. For now, recommend that *tools* inherit documentation for a > > method if f.__doc__ == None, and don't inherit if > > f.__doc__ = '' or any other string. > > I know I'm about to vary my tune but ... someone else has been talking > persuasively out-of-band. Rather than borrowing the doc directly off > the parent ... I think the issue of whether to borrow, or point back, etc., should be one for the tools. Which may be a good reason for the language *not* to do anything automatic, like inheriting doc strings. There are similar questions about whether inherited methods should be listed in a separate section or not, etc. But at any rate, we should say that having f.__doc__=None indicates that inheriting docs is acceptable, and f.__doc__='' means that inheriting docs is not acceptable. Of course, all of this will be difficult to do if we're parsing the file instead of loading it as a module; but that's ok. :) -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 18:49:53 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 13:49:53 EST Subject: [Doc-SIG] backslashing In-Reply-To: Your message of "Tue, 20 Mar 2001 23:23:33 GMT." Message-ID: <200103211849.f2LInrp11620@gradient.cis.upenn.edu> > OK, two orthogonal questions about a verbatim fragment: > * inline or block > * python code or `alien text' > giving us four kinds of `verbatim' fragment in doc-strings. So currently, we do have 4, but they're not exactly the 4 you listed. Instead of "python block," we have "python test case," which is used by some an automated testing program. You can use these to show code & its output, but not for exceptions, and a number of other cases. I'm still not sure I like this system, but it seems somewhat reasonable. The syntax for these python test blocks is a paragraph starting with '>>>', and ending at the next blank line. It should include both the input and the output of the commands you run, although no commands should output lines starting with '>>>' or '...'. > The python/alien distinction is important, because a python fragment is > worth the renderer scanning for identifiers it knows about, so may wish > to render as xrefs to pertinent documentation I think we're going to want to be careful not to put xrefs all over in #literal# sections.. E.g., In the description of class Foo, literals that say #Foo# shouldn't link to Foo (which you are already presumably looking at). And If you talk about class #Bar# five times, there shouldn't be 5 xrefs. But we'll leave this for the tools to deal with. > (however, this is the > *only* processing the renderer should be doing to it); Well, that's not currently what's done with the python test blocks.. But that's because we're trying to be compatible with that automated testing program... (will this change if functions/methods get attributes, and test strings move out of doc strings?) > I regard layout-control as a luxury, subordinate to keeping the markup > language simple. I'll agree for now. So no backslashes, and if someone really wants to use "#" in a python literal, or "'" in a literal, then they have to use a separate block. > Out-of-source documentation should provide the means for folk to say > things the way they want and have it displayed the way they want: for > in-code ST docs, simplicity of markup language is *more important* than > making it look nice. It suffices that we ensure that the author can > express the information the reader needs. A few rare cases where this > will merely be ugly are not worth extra complexity. Hm. Mind if I quote that in my PEP? ;) > the send method writes one '#' character to #sock.control# for every > #chunk# bytes it writes to #sock.out#. > albeit I may be marking more things with #...# than I need to (am I ?). It seems to me that if we're going to use #...# for python literals, then we should really use it for them. I see a danger here of people using 'sock.out' if they don't want an xref, and #sock.out# if they do want an xref. I'm not sure that's what we want people to be doing.. But I'm not sure what the best thing to do about it is. -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 18:53:14 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 13:53:14 EST Subject: [Doc-SIG] URLs In-Reply-To: Your message of "Wed, 21 Mar 2001 11:19:20 +0100." <3AB88028.605B88A6@lemburg.com> Message-ID: <200103211853.f2LIrEp12140@gradient.cis.upenn.edu> [On surrounding URLs with delimiters] > Sounds like a good idea, but don't you use angular brackets ? > These are recommended by the URI RFC, in wide use everywhere and > have similar properties... I anticipate problems with selling this to the STNG people. (Although maybe we don't care, because we're already incompatible with them on any string containing <...>). But I'd like to try to convince them that this is a Good Idea, and that not just passing random through is also a Good Idea. So... Where do I go to do my convincing? Do I write a wiki page on the Zope site? Or can I write email somewhere? Anyone else want to help me convince them? :) -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 19:03:24 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 14:03:24 EST Subject: [Doc-SIG] Tokens for labels & endnotes In-Reply-To: Your message of "Wed, 21 Mar 2001 10:32:36 GMT." <001c01c0b1f2$3ee60d80$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103211903.f2LJ3Op13591@gradient.cis.upenn.edu> > I'm assuming we're talking about paragraph labels. Actually, I think we were talking about [endnotes]. But the same questions apply to labels.. > I think we should just go with the English definition of a word, which > means [-A-Za-z], and leave it at that. It is *meant* to look like a > word. Is that too anglo-centric? > I think "keep it simple" is required here - these labels are meant to be > few and simple, so English words seems sensible to me. I would thus vote > against underlines and against digits. It might be that underlines and digits are more applicable for endnotes. Some people might like this [1] or this [noam_chomsky97]. > Also, validation aside, I don't *use* a regular expression - I look for > the right "shape" of paragraph (1 line, colon in it) and check what is > to the left of the colon against the dictionary. From *my* point of view > the legitimate characters idea only comes in with a validation phase (of > course, it would be different for Edward). This may be different if you want [this to not be an endnote]. > > Basically re defines '\w' = '[0-9a-zA-Z_] > > Erm - basically it doesn't - it invokes "locales" which makes life more > complex (and I have no idea what sre does about '\w'). If LOCALE and UNICODE flags aren't used when compiling a regexp, \w = [a-zA-Z0-9_] (at least according to "the python library reference manual for re":). Furthermore, it will always match '_', regardless of LOCALE and UNICODE (again, according to the ref. manual). -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 19:26:13 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 14:26:13 EST Subject: [Doc-SIG] What docs should be in the source file? In-Reply-To: Your message of "Wed, 21 Mar 2001 09:14:56 PST." Message-ID: <200103211926.f2LJQDp15504@gradient.cis.upenn.edu> > I think it's relevant. The question is "how far is the user of a > module from *some* information on how to use the module?" Doesn't > matter if they don't have every article that anyone has ever > written about the module -- do they have a starting point? We all agree that *some* information on how to use the module should be included.. But the question is *what types* of information to include? I think that the in-code documentation should tend to be concise, short, and technical. It should be light on examples, and stick to defining individual elements, with a short overview to describe how the elements fit together. > If changing the code changes the behaviour of the module so that > your examples don't work any more, then yeah, you'd better edit > the examples. (Scenario: i've changed a method name from foo() to > spam(), so in my editor i search for ".foo(" to do replacement...) Often, the consequences of changes on the tutorial-type docs is not obvious. Having to think (hard) about whether my changes affect the tutorials/howtos/FAQs/etc. every time I make a change seems unreasonable. On the other hand, API documentation is (by definition) local -- it should be obvious what parts of the documentation need to be updated when I change the code. I guess it just seems like, if I were to have tutorials, etc. in the code myself, I would not be likely to check them every time I make a change. And I think that there are many programmers out there who are lazier than I am. > It's also harder for me to change foo() to spam() in just the code, > check in just that part, and say "oh, i'll change the docs later" -- > because i'll be checking in a single file that's inconsistent with > itself. I have a suspicion that laziness will win out, and people will just say "whatever".. If the docs are in a different file, I can do a CVS diff to see what's changed in the code since the last time I updated the docs, and thus can do updates to the documentation "in batch." -Edward From tavis@calrudd.com Wed Mar 21 20:09:37 2001 From: tavis@calrudd.com (Tavis Rudd) Date: Wed, 21 Mar 2001 12:09:37 -0800 Subject: [Doc-SIG] What docs should be in the source file? In-Reply-To: <200103211926.f2LJQDp15504@gradient.cis.upenn.edu> References: <200103211926.f2LJQDp15504@gradient.cis.upenn.edu> Message-ID: <01032112093702.04676@lucy> I side with Ping on this one. Just because there's some extra examples and how-to-use-me documentation in a source file doesn't mean that you're obliged to update it every time you work on the source. Ping's argument is that it is more likely that lazy programmers will update that type of information if it's stored in the same file. If this argument wins out there wouldn't be much extra to put in the modules anyway, as Python's current 'Library Reference' docs are largely a rehash of the API information. By the way, Edward, Tibs, and Eddy, the energy you guys are putting into ST this month is impressive. You've almost doubled the activity of the previous most active month in DOC-SIG. From Edward Welbourne Wed Mar 21 20:21:19 2001 From: Edward Welbourne (Edward Welbourne) Date: Wed, 21 Mar 2001 20:21:19 +0000 (GMT) Subject: [Doc-SIG] backslashing In-Reply-To: <200103211849.f2LInrp11620@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103211849.f2LInrp11620@gradient.cis.upenn.edu> Message-ID: > Hm. Mind if I quote that in my PEP? ;) coo - flattery ;^> Be my guest. > Instead of "python block," we have "python test case," Hmm. To me that's a different (complementary) thing from what I want the python block to be. A test case should be written to actually run for real; a python block should just be illustrating use of the code and might, indeed, be deliberately broken, e.g. as an accompaniment to the explanation of why something is done the slightly odd way it is, so that maintainers will realise what would go horribly wrong if hey made the obvious `improvement'. Equally, plenty of the tools I write are intended to be used from within the implementations of other tools; having a test system `run' the illustrations I'd want to supply is pointless - e.g.:: class selfRepresenting: def _emit(self, *bits): """Representation support method. ... for example: def __repr__(self): return self._emit(`self.state`, 'name=' + `self.name`) """ return (_fullname(self.__class__) + '(' + string.joinfields(bits, ', ') + ')') in which the illustration only gives the __repr__ method of a class implicitly inheriting from selfRepresenting. To turn it into a workable test which actually tests anything, you'd have to write such a class, provide .state and .name attributes for its members, instanciate this class and (possibly implicitly) call repr() on the resulting object. If you treat it, as given, as a test, all you'll do is verify that the illustrative code gets past the python parser. [Albeit finding that fragment involved trawling through a lot of my code, noticing that for the most part I do illustrations by saying `see class foo, below' or similar; and the above class is purely a sketch I don't think I use.] > In the description of class Foo, literals that say #Foo# shouldn't > link to Foo (which you are already presumably looking at). And If you > talk about class #Bar# five times, there shouldn't be 5 xrefs. But > we'll leave this for the tools to deal with. Yes, that's a tool issue: and my guess is that tool authors will agree with you - Ping, what do you do ? Tool authors should be *at liberty* to do xrefs from the contents of python literals, or (without touching the literals) to xref, in a `see also section', every identifier seen in a literal, or to ... mutatis mutandis. > I'll agree for now. So no backslashes Thank You. Now I can get some sleep at last ... Eddy. From Edward Welbourne Wed Mar 21 20:39:30 2001 From: Edward Welbourne (Edward Welbourne) Date: Wed, 21 Mar 2001 20:39:30 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <001c01c0b1f2$3ee60d80$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <001c01c0b1f2$3ee60d80$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: > I think "keep it simple" is required here to me that needs to include: * case insensitive * digits because authors of doc-strings are going to be shocked if it behaves otherwise. The former means your dictionary-based approach is not satisfactory - string.tolower the apparent label, then check to see whether the result appears in some list (or other implementation of `collection') of known labels. Otherwise, your builder.label_dict is going to need further entries for, at least: "Pep":"pep", "Post-history":"post-history", "Discussions-to":"discussions-to", since some folk using the keys you gave *will* use them in the forms shown; and you'll probably also need "Discussions-TO":"discussions-to", etc. Simpler: use tolower. Have canonical forms generally be in Capitalised-Word form (like RFC 822 labels). Indeed, a good way to implement the aforementioned `collection' would indeed be a mapping which is exactly the reverse of the ones you showed us - mapping from the tolower form to the canonical form for each key - so that one recognises a key using: try: canon = labels[string.tolower(text)] except KeyError: ... # it isn't a real label I am entirely happy to have the present *actual dialects* of ST use only letters and dash; however, allow ST-generic to permit numbers, e.g. so that ST variants *can* use "rfc2954-char-set": "RFC2954-Char-Set" in their label dicts, or similar. (No, I have no idea what RFC 2984 is, nor even whether it exists.) >> Basically re defines '\w' = '[0-9a-zA-Z_] > Erm - basically it doesn't - it invokes "locales" which makes life more > complex (and I have no idea what sre does about '\w'). and I can't say I care much either way, once you're allowing - in the label. The only need for _ is to separate words, and - is easier to type ;*> Eddy. From tony@lsl.co.uk Thu Mar 22 10:10:11 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 10:10:11 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103211702.f2LH23p02270@gradient.cis.upenn.edu> Message-ID: <001401c0b2b8$47d23920$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > (I should be done with STminus002 relatively soon). Good. As you said to Ka-Ping Yee elsewhere, a simple and a complex choice for ST variants is a good choice to have (although I would add, of course I would, that the complexity is either inherited from STClassic, or asked for in the past iterations of this SIG). > > 3. Local references (which look like '[this]' or '[1]') are now > > supported. The "anchor" for a local reference must be at > the start of a > > paragraph (in future releases I would expect it to *start* a new > > paragraph if at the start of a line), and looks like:: > > > > ..[this] > > So... are anchors always hrefs? Or can they be generic footnotes? Or > references for a references section? How should we deal with these > when we're using something other than HTML (e.g., LaTeX) to render > the string? If anchors can be footnotes or references, how does the > renderer decide what to do with them? Erm - no, in HTML terms, anchors are names. The obvious HTML translation of the DOM tree for a local reference and anchor is:: Some text containing a local reference to [this]. [this] is the anchor. In the DOM tree, I have to decide what to put into the "reference", and at the moment I follow HTML/XML conventions and store what you see - that is, the reference element has an attribute whose content is the string '#this'. I use the same attribute name as I use for other links. The advantage of this is twofold - it means we have only one way of linking within the document (which will map easily to both HTML and to XLinks, although we are only using the simplest subset of XLinks!), and it means a user can regard:: [This] is a local reference and:: "This":#this is a local reference as the same, which isn't much use *within* a document, but is *very* useful for allowing links from outside. As to using HTML/XML type links - well, we already had to choose URLs for our external links (or think of it as using simple XLinks if that makes you happier) - this makes consistent sense if we are using a DOM tree to represent our document, anyway. It makes sense to continue this for local references. A tool like TeX would need some untangling of the '#this' to just 'this' for use in its '\xref', but that's hardly difficult. > I'll add this too. BTW, how are you currently handling > things like this:: > > 1. some text > > some more text The list item is at indentation N, the next paragraph at indentation N+3, so that is a list item paragraph and its first child. The "flattening" phase will note that the first item is a list item and the second a paragraph (tags "oitem" and "para"), and bring the paragraph up to be a sibling of the list item. In summary, the initial internal structure is:: and this gets "flattened" to be:: which then gets translated into the DOM tree as elements with those tags (both will, of course, be children of a surrounding '' element). (if we had:: This is a paragraph. And so is this. then the flattening phase would say to itself "aha - a paragraph within a paragraph - presumably the user *meant* something by that", and in this case it would produce:: (clearly we don't regard a paragraph inside a paragraph as being very meaningful in any real sense, but it seems a pity to waste the indentation that the user put in so carefully, and this is the obvious meaning to take). In an HTML rendering, I would expect 'block' to become 'blockquote'.) > > 5. The RE used for detecting URLs has become more > > sophisticated. There are some associated rules > > Hm.. I don't look forward to formalizing this, and trying to get STNG > to agree with your regexps :) STNG has its own REs. They don't make much sense to me (or didn't last time I looked at them). In some cases, they just didn't work very well. Oh well. But I don't see why *formalising* it is a problem? > Note also that it should be possible to generate the "long RE > expression" in a *principled* way, given a formalization, so that > it will detect *all* errors, not just *common* errors. This I don't understand - I'm not sure what you mean by "in a principled way", and I'm also not sure what you mean by "all errors, not just common errors". But this will doubtless become clearer to me as STminus progresses (I begin to suspect you may regret that name some day, as it becomes more capable and more clearly sufficient-to-itself). > Ok, in the formalization system I set up, I divided everything into > "valid" and "undefined". I see a good argument for further dividing > "undefined," though.. So I'll redefine my terms, as such: > > valid -- The string has a unique, predictable result. this is the > same result that it will have in all future versions. > invalid -- The string does not have a unique, predictable result > illegal -- The string will never have a unique, > predictable result > undefined -- The string does not currently have a unique, > predictable result, but it may in a future version. > > Is that acceptable terminology? (I'll try to remember to stick to > it) I'm not sure I'd bother to separate the middle two ("never" is a big concept, and four is somehow more uncomfortable with three), but otherwise I'd be happy to go with those... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 10:15:30 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 10:15:30 -0000 Subject: [Doc-SIG] Local References In-Reply-To: <200103211709.f2LH9Dp02661@gradient.cis.upenn.edu> Message-ID: <001501c0b2b9$0607eb10$f05aa8c0@lslp7o.int.lsl.co.uk> > Clarification on the syntax.. is *anything* that looks like [this] a > local reference, or does it have to be preceeded by "a parenthetical > like"[this] or "a parenthetical and a colon like":[this]? Anything that looks like [this] is a local reference. It should be rendered exactly as it looks in the ST text (i.e., as I said in another message, in HTML it might be:: [this] > What happens if the referent is missing? If validation is on, the user gets a warning (i.e., the implementation is expected to be able to detect this case). docutils doesn't do this yet, but it clearly should. > What is acceptable content for [this]? '[\w_-]+'? Acceptable contents is:: [-_A-Za-z] [-_A-Za-z0-9]* | [0-9]+ (i.e., there are two legitimate forms - the first is a a traditional "identifier", and the second is a simple integer - this latter is included because it is a common form in text as people write it - cf: PEPs. Looking at it, I'm not sure I should allow a hyphen as the first character of the "identifier" form - that may be a typo.) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 10:29:47 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 10:29:47 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103211719.f2LHJjp03408@gradient.cis.upenn.edu> Message-ID: <001601c0b2bb$04ed8f80$f05aa8c0@lslp7o.int.lsl.co.uk> > > Well, prepare to be well miffed (ST has never supported differing > > starting and ending quotes). So hey. > > Although now we have [...] (or "..."[...] or "...":[...] or whatever > it really is). Oh, OK. Point taken (I'd already sensed you don't like those, but they seem to me so uncontentious). > Of course, you can't reasonably forbid '#' in URLs, so you'll have > to put URL recognition before inline recognition *anyway*.. :) No, that's not a problem. A '#' in a URL cannot be a starting or ending quote, at least not if I have the URL RE right, because it won't meet the correct conditions (for instance, it can't have a space before or after it). A little messy, but that's how ST works. > I think there's a serious problem here if we are allowing URLs to > appear in arbitrary places. For example, consider:: > > foo://no#good bar://parse#for this. > > It seems perfectly reasonable for #good bar...# to be a literal.. No, that case is unambiguous. The '#' characters are parts of URLs - they definitely are not quotes. The RE for Python literals (wait a moment whilst I reconstruct the bits together) is:: (?P ^ | [ \n] ) \# (?P [^\#\n]+ ) \# (?= [ \n] | $ | [).,;:!?"] ) So the first '#' isn't the start of a literal, because it isn't preceded by , nor is it the end of a previous literal, because it isn't followed by space or punctuation (broadly speeking). The second '#' isn't the end of a literal, because it isn't followed by space or punctuation, nor is it the start of a literal. It is *just* possible that I need to worry about the terminating punctuation and the contents of a URL (and maybe it shouldn't be #_endpunc#, but #_safe_endpunc# in the Python literal URL - that would enforce more context after the punctuation, for safety). Now, I have a sneaky feeling that you don't like that sort of approach, but so far as I can tell it fits *exactly* the "philosophy" of ST, which is to make what the user types, in general, come out as they would naively expect - I *think* a naive user would expect the above not to be doing quoting. > 1. Say that the opening '#' must have whitespace to its left, > and the closing '#' must have whitespace to its right. Of > course, that forbids saying things like #Object#s, but I > guess I could live with that As you can see, that's the approach taken - and it is taken that way to be as near identical to the way that single quote literals work as possible (which also have that same problem). > 2. Use some special demarkation for URLs! :) I'm for this, > but am worried about trying to convince the STNG people, > esp. if we're proposing using <..>.. Since they're currently > saying that such things should be ignored. Of course, they're > clearly wrong on that point, too, but it means that I'll have > to argue 2 different points at once. :) Also, if we do this, > we have to be sure to stress in the PEP/ST docs that math > must go in literals like: 'x*y>z'. (Of course, we'll probably > want to stress that anyway). I would also like to delimit URLs - it would make life so much simpler. But I also suspect the STNG people won't agree (of course, we might both be wrong!). I still don't see why 'x*y>z' *has* to go in literals, though - clearly by the current and possible future rules it would work (if we do introduce quoting characters for URLs, I would want to insist they act with the same sort of rules as literals and Python literals, so that maths would be no problem). > Are there any objections in principle for using <...> to delimit > URLs? (Other than that it will be hard to convince STNG people). > If not, I think we should start trying to convince STNG people to > use <...> for URLs, and to give up on ignoring <...> tokens. I think it would be a marvellous idea - people already are used to it in emails, and it makes life simpler all round. Yes, by all means open talks on this matter on the STNG arena. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From mal@lemburg.com Thu Mar 22 10:31:33 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 22 Mar 2001 11:31:33 +0100 Subject: [Doc-SIG] URLs References: <200103211853.f2LIrEp12140@gradient.cis.upenn.edu> Message-ID: <3AB9D485.A8FB6F35@lemburg.com> "Edward D. Loper" wrote: > > [On surrounding URLs with delimiters] > > Sounds like a good idea, but don't you use angular brackets ? > > These are recommended by the URI RFC, in wide use everywhere and > > have similar properties... > > I anticipate problems with selling this to the STNG people. (Although > maybe we don't care, because we're already incompatible with them > on any string containing <...>). > > But I'd like to try to convince them that this is a Good Idea, and > that not just passing random through is also a Good Idea. If that's what they want to do, they can use the scheme delimiter (:) in URLs to make a separation between HTML-Tags and URLs. AFAIK, the colon is not allowed in HTML-Tagnames (XML is different due to the namespace notation). > So... Where do I go to do my convincing? Do I write a wiki page > on the Zope site? Or can I write email somewhere? Anyone else > want to help me convince them? :) Sending in patches usually helps ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From tony@lsl.co.uk Thu Mar 22 10:40:20 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 10:40:20 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103211748.f2LHm3p05195@gradient.cis.upenn.edu> Message-ID: <001701c0b2bc$7e2f03a0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > "paragraph with a blank line before it" is definitely *not* > what you want, since it leaves out what I would call a paragraph > at the beginning of a document, and > it could potentially include any other basic block, if it happens to > have a blank line before it (which is required for headings, etc.)). Actually, I agree with you - it's a messy term, anyway. > > > * basic block = paragraph or list item or heading or label (or > > > table?) > > > > Paragraph (see above) > > I think this is somewhat misleading/confusing.. But I guess that's > up to you to decide.. Again, I agree - it's an artefact of the internals. What about "text block"? > I guess that seems reasonable. Within paragraphs, do you collapse > multiple spaces into one space? No, within lines spaces are (very carefully) left untouched, just in case. > > > Agreed. Although how do you put something at zero indentation? > > > Maybe indent from 1 space over from the preceeding paragraph? > > > > You don't. I've never wanted to (my problems with HTML normally come > > from trying to do the opposite). > > Hm.. I'm not sure I agree with this, but I don't think it's important > enough to get hung up on. (I would argue that you should be able > to put things in column 0, but that the HTML renderer should probably > indent preformatted regions relative to everything else). I couldn't see an easy and natural way around it, and I find it hard to conceive of places where *I* would not want to indent, so I gave up (the problem was actually thinking how to decide on an indentation at all, and I was quite pleased with how predictable and useful using the indentation relative to the "parent" or preceding paragraph was). > > > > > "the following is not a url": > > > > That's right. In this instance. > > So does it get rendered as is (i.e., with two quote signs, one colon > sign, a less than sign, and a greater than sign)? That's up to the renderer. But seriously, it gets *stored* as a node of the DOM tree which has the text within quotes (i.e., the quotes are not preserved) as its text, and the URL as its 'url' attribute. Thus the ST markup (the double quotes and the colon) are not remembered. > We should be able to print out *all* problems, not just *possible* > problems, if the user really wants us to. This seems very important > to me if we want to allow for the possibility of competing >implementations of ST. I don't have a problem with telling the user what is wrong with a text, I just don't understand how to quantify that. Of course, in STminus, you have a different handle on things, but that's because you're deciding up front what is allowable and what is not. A "more traditional" ST approach doesn't know that. But being able to give the user as many warnings of problems as possible has got to be a good thing... > The markup-nesting problem doesn't actually seem that difficult to me, > in principle. I propose that we allow anything to nest > within anything, > with the restrictions: > 1. nothing can nest inside a literal, inline, or href url Agreed. But please don't call it an 'href url' - that's an HTML term! > 2. nothing can nest within itself (even with intervening levels) Pragmatically has to be true, with non-differentiated start and end quotes. These two seem to me to be the sane minimum, and thus sensible. > So the legal nestings are shown in this tree: > > * literal > * inline > * emph > * literal OK, OK, I believe you! > Also, spaces must come between * and ** delimiters, so you > can't say ***this***. Ah, but there's no reason you shouldn't be able to *say **this***, for instance (it's quite unambiguous). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 10:47:57 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 10:47:57 -0000 Subject: [Doc-SIG] backslashing In-Reply-To: <200103211849.f2LInrp11620@gradient.cis.upenn.edu> Message-ID: <001801c0b2bd$8e347720$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > So currently, we do have 4, but they're not exactly the 4 you > listed. Instead of "python block," we have "python test case," > which is used by some an automated testing program. You can > use these to show code & its output, but not for exceptions, No, exceptions do work now. > and a number of other cases. I'm still not sure I like this > system, but it seems somewhat reasonable. Given the attraction of doctest, it seemed sensible to allow its code blocks to be treated as such. And given its now in the standard Python package, we'd do well not to ignore it. > The syntax for these python test blocks is a paragraph starting > with '>>>', and ending at the next blank line. It should include > both the input and the output of the commands you run, although > no commands should output lines starting with '>>>' or '...'. Well, that's not up to STpy to say - that's only true if you want to run doctest on it (and may not be the exact rules required there either). > I think we're going to want to be careful not to put xrefs all over > in #literal# sections.. E.g., In the description of class Foo, > literals that say #Foo# shouldn't link to Foo (which you are already > presumably looking at). And If you talk about class #Bar# five times, > there shouldn't be 5 xrefs. But we'll leave this for the tools to > deal with. Agreed. > Well, that's not currently what's done with the python test blocks.. > But that's because we're trying to be compatible with that automated > testing program... (will this change if functions/methods get > attributes, and test strings move out of doc strings?) One of the points about doctest is that when you are describing how something works (i.e., this *will* be in a docstring) it is useful to *show* how it works (or, of course, doesn't). And if you're doing that, then it makes sense to check that what is typed is correct (now who could deny that). And if you're doing that, why, you're testing! So whilst doctest does support "out of line" test strings, it will always be the case that it will run on docstrings as well - by original intent. I suggest you read the relevent chapter in the 2.1 documentation. > It seems to me that if we're going to use #...# for python literals, > then we should really use it for them. I see a danger here of people > using 'sock.out' if they don't want an xref, and #sock.out# if they > do want an xref. I'm not sure that's what we want people to > be doing.. > But I'm not sure what the best thing to do about it is. The case for how cross-referencing of Python quantities is done is not yet decided - it hasn't been discussed (again) yet. One of the ways of getting useful input to it will be to see what happens if (intelligent) guessing is done by a tool. It *may* be that we want to indicate which '#..#' things are to be cross-referenced, and there are on-board suggestions for this (the obvious one is to use '[..]' to indicate the desire - yuck - and another idea is to use something like '^#..#'). But I see this as being a *later* discussion. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 10:50:41 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 10:50:41 -0000 Subject: [Doc-SIG] URLs In-Reply-To: <200103211853.f2LIrEp12140@gradient.cis.upenn.edu> Message-ID: <001901c0b2bd$f04b6590$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > But I'd like to try to convince them that this is a Good Idea, and > that not just passing random through is also a Good Idea. > > So... Where do I go to do my convincing? Do I write a wiki page > on the Zope site? Or can I write email somewhere? Anyone else > want to help me convince them? :) I believe the correct way to do this is to put a new page under the "suggestions for things to do to STNG" page, and wait... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 10:54:00 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 10:54:00 -0000 Subject: [Doc-SIG] Tokens for labels & endnotes In-Reply-To: <200103211903.f2LJ3Op13591@gradient.cis.upenn.edu> Message-ID: <001a01c0b2be$67095890$f05aa8c0@lslp7o.int.lsl.co.uk> > > I'm assuming we're talking about paragraph labels. > Actually, I think we were talking about [endnotes]. But the same > questions apply to labels.. Erm, maybe (sorry I lost the thread) > > I think we should just go with the English definition of a > word, which > > means [-A-Za-z], and leave it at that. It is *meant* to look like a > > word. > > Is that too anglo-centric? Yes. And it will need to be fixed, but not in the first release. (this is a general point about docutils, and at the moment STpy as well, and I think it needs more input from other people at a later stage) > It might be that underlines and digits are more applicable for > endnotes. Some people might like this [1] or this [noam_chomsky97]. For labels I want to exclude '-_', but yes, for labels I want to include them. > If LOCALE and UNICODE flags aren't used when compiling a regexp, > \w = [a-zA-Z0-9_] (at least according to "the python library > reference manual > for re":). > Furthermore, it will always match '_', regardless of LOCALE and > UNICODE (again, according to the ref. manual). My rather desparate hope (not having read the RE section in the new 2.1 manuals yet) is that using REs will give good leverage on the problem mentioned at the top of the email, at which point it *does* become useful to use '\w'. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From mal@lemburg.com Thu Mar 22 10:54:09 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 22 Mar 2001 11:54:09 +0100 Subject: [Doc-SIG] What docs should be in the source file? References: <200103211926.f2LJQDp15504@gradient.cis.upenn.edu> Message-ID: <3AB9D9D1.9FD986A5@lemburg.com> Just to drop in an opinion on the subject: I think almost all API related documentation should go into the source file. Concepts, graphics and other things can be kept in different files, e.g. Word files, but the API should be completely defined in the source file. This is what I was targetting with PEP 224 (attribute docstrings), but which will not happen... maybe Ping has an alternative which will let me document attributes too ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From tony@lsl.co.uk Thu Mar 22 11:04:06 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 11:04:06 -0000 Subject: [Doc-SIG] backslashing In-Reply-To: Message-ID: <001b01c0b2bf$d02b7320$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Welbourne wrote: > A test case should be written to actually run > for real; a python block should just be illustrating use of > the code and > might, indeed, be deliberately broken, e.g. as an accompaniment to the > explanation of why something is done the slightly odd way it > is, so that > maintainers will realise what would go horribly wrong if hey made the > obvious `improvement'. You really should read the doctest documentation (see the chapter in the 2.1 docs for the best intro) - it *will* test broken examples as well > Equally, plenty of the tools I write are intended to be used > from within > the implementations of other tools; having a test system `run' the > illustrations I'd want to supply is pointless - e.g.:: > > class selfRepresenting: > def _emit(self, *bits): > """Representation support method. > ... > for example: > > def __repr__(self): > return self._emit(`self.state`, > 'name=' + `self.name`) > """ > return (_fullname(self.__class__) + '(' + > string.joinfields(bits, ', ') + ')') > > in which the illustration only gives the __repr__ method of a class > implicitly inheriting from selfRepresenting. But as you've presented it, that wouldn't naturally be presented as an interatice session at all - one wouldn't write it as:: for example: >>> def __repr__(self): and so on but rather as:: for example:: def __repr__(self): and so on That's *why* the chosen "start of Python paragraph" thing is '>>>' - because it *is* what it looks like. Anwyay, we probably aren't disagreeing - the "job" of the '>>>' paragraph is well delimited (and was introduced as an idea last time round the Doc-SIG loop, of course), and it is not the same as the job of the '::' paragraph. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 11:10:31 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 11:10:31 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Message-ID: <001c01c0b2c0$b547a410$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Welbourne wrote: > > I think "keep it simple" is required here > to me that needs to include: > * case insensitive > * digits Unfortunately, I strongly disagree. I may be *wrong*, but I disagree. > because authors of doc-strings are going to be shocked if it behaves > otherwise. Well, it's how *I'd* expect it to work (and, of course, neither you nor I are exactly representative examples). > The former means your dictionary-based approach is not > satisfactory - string.tolower the apparent label, then check to see > whether the result appears in some list (or other implementation of > `collection') of known labels. > Otherwise, your builder.label_dict is > going to need further entries for, at least: > > "Pep":"pep", > "Post-history":"post-history", > "Discussions-to":"discussions-to", > > since some folk using the keys you gave *will* use them in the forms > shown; and you'll probably also need Erm - PEP 1 is quite clear over what it wants (it doesn't say one can use case variants, so why should one assume one can?) (((I probably won't do it unless more convinced, but of course doing a string.lower *would* be simple (before doing the dictionary lookup).))) > Have canonical forms generally be in Capitalised-Word form > (like RFC 822 > labels). Indeed, a good way to implement the aforementioned > `collection' would indeed be a mapping which is exactly the reverse of > the ones you showed us - mapping from the tolower form to the > canonical > form for each key - so that one recognises a key using: > > try: canon = labels[string.tolower(text)] > except KeyError: ... # it isn't a real label > > I am entirely happy to have the present *actual dialects* of > ST use only > letters and dash; however, allow ST-generic to permit numbers, e.g. so > that ST variants *can* use > "rfc2954-char-set": "RFC2954-Char-Set" > in their label dicts, or similar. > (No, I have no idea what RFC 2984 is, nor even whether it exists.) Hmm. OK - we're looking for compatibilty with emails, he guessed wildly. Unfortunately, there is *no way* that I can see of unambiguously and obviously allowing paragraph labels to *start* a paragraph (it just leads to too many pitfalls - we use colons in english too freely). So trying to parse bare emails with STpy is always going to be a problem. However, the case for allowing digits is there, I suppose. I'll think about it (he begrudged). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 11:15:02 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 11:15:02 -0000 Subject: [Doc-SIG] What docs should be in the source file? In-Reply-To: <3AB9D9D1.9FD986A5@lemburg.com> Message-ID: <001d01c0b2c1$5753dd00$f05aa8c0@lslp7o.int.lsl.co.uk> M.-A. Lemburg wrote: > This is what I was targetting with PEP 224 (attribute docstrings), > but which will not happen... maybe Ping has an alternative which > will let me document attributes too ?! Strangely, whilst I intensely disliked the way PEP 224 was linking strings to "coincidentally" adjacent values (not that I had a better way to do it!), only last night I found myself intensely wishing I had attribute docstrings - they would indeed make documenting things like class variables so much more pleasant - I could just move the text in the comment above the value into some other form and heh presto, documentation **adjacent to the entity documented** *and* user visible. Ho hum. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Thu Mar 22 13:33:10 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 08:33:10 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Thu, 22 Mar 2001 10:10:11 GMT." <001401c0b2b8$47d23920$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103221333.f2MDXBp17465@gradient.cis.upenn.edu> I wanted to make sure my terminology was clear, because it looked like the indentation got messed up somehow. My terms are: * valid * invalid * illegal * undefined I.e., "illegal" and "undefined" are mutually-exclusive, collectively exhaustive *subsets* of "invalid." We would use "illegal" for strings like:: * This ** is * a ** very * bad ** string * We would currently use "undefined" for strings like:: * Nesting is not yet **implemented** * Both are "invalid" for the current version of STpy, i.e., their meaning is undefined. But we *never* intend to give a meaning to the first one. Of course, an implementation can still give it a structure, if the user asks it to.. But "illegal" strings will *never* be defined under STminus. Hm.. I hope that's clearer.. :) -Edward From edloper@gradient.cis.upenn.edu Thu Mar 22 13:35:35 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 08:35:35 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Thu, 22 Mar 2001 10:10:11 GMT." <001401c0b2b8$47d23920$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103221335.f2MDZZp17578@gradient.cis.upenn.edu> > > > 3. Local references (which look like '[this]' or '[1]') are now > > > supported. So, if I understand correctly, this ST:: This is a [test] ..[test] of local references Would be rendered in HTML as::

This is a [test]

[test] of local references

? I'm not sure how you'd render it in LaTeX.. Can anchors appear anywhere in the document? Do they have to be their own paragraphs? Can anchors be treated as footnotes (e.g., by LaTeX)? What can their contents be? E.g., can they contain a list item:: ..[test] * This is a list item ? A list:: ..[test] * Item1 * Item2 ? etc. > it means a user can regard:: > > [This] is a local reference > > and:: > > "This":#this is a local reference > > as the same, which isn't much use *within* a document, but is *very* > useful for allowing links from outside. Are we expecting people to *want* to link into a document from outside? I can't see ever having any use for that when writing API docs... > A tool like TeX would need some untangling of the > '#this' to just 'this' for use in its '\xref', but that's hardly > difficult. Hm.. maybe I just don't know enough LaTeX. :) [about handling multi-paragraph list items] >> 1. some text >> >> some more text > and this gets "flattened" to be:: > > > I would argue that it would be more appropriate to use:: 1. some text some more text Also, what would your "flattening" do with:: 1. some text some more text even more indented > (if we had:: > > This is a paragraph. > > And so is this. > > then the flattening phase would say to itself "aha - a paragraph within > a paragraph - presumably the user *meant* something by that", and in > this case it would produce:: > > > > Can these nest arbitrarily deeply, if they keep indenting? > > > 5. The RE used for detecting URLs has become more > > > sophisticated. There are some associated rules > > > > Hm.. I don't look forward to formalizing this, and trying to get STNG > > to agree with your regexps :) > > STNG has its own REs. They don't make much sense to me (or didn't last > time I looked at them). In some cases, they just didn't work very well. > Oh well. Well, then, we should convince them to change them! :) > But I don't see why *formalising* it is a problem? It's just nice to have formalisms that don't contain big difficult-to-explain regular expressions. It makes the formalism harder to understand. > > Note also that it should be possible to generate the "long RE > > expression" in a *principled* way, given a formalization, so that > > it will detect *all* errors, not just *common* errors. > > This I don't understand - I'm not sure what you mean by "in a principled > way", and I'm also not sure what you mean by "all errors, not just > common errors". > But this will doubtless become clearer to me as STminus progresses (I > begin to suspect you may regret that name some day, as it becomes more > capable and more clearly sufficient-to-itself). Anything whose meaning is not defined by the formalism is invalid. It should be possible for a user to ask a tool to tell them if they use any invalid forms -- that way, they are guaranteed that what they have written will be interpreted as specified by the *formalism*, regardless of which implementation/tool they happen to be using (unless that tool has a bug). And given a formalism, it is possible to detect invalid forms in a principled way. -Edward From edloper@gradient.cis.upenn.edu Thu Mar 22 13:47:42 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 08:47:42 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Thu, 22 Mar 2001 10:29:47 GMT." <001601c0b2bb$04ed8f80$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103221347.f2MDlgp18142@gradient.cis.upenn.edu> > > Of course, you can't reasonably forbid '#' in URLs, so you'll have > > to put URL recognition before inline recognition *anyway*.. :) > > No, that's not a problem. A '#' in a URL cannot be a starting or ending > quote, at least not if I have the URL RE right, because it won't meet > the correct conditions (for instance, it can't have a space before or > after it). A little messy, but that's how ST works. Hrm. Ok, currently STminus says that words can contain #inlines#. I would *really* appreciate it if you could either run my test cases on STpy, or at least read through them.. Because this is one of them, and I'd really like to know where else the test cases disagree with STpy. > Now, I have a sneaky feeling that you don't like that sort of approach, > but so far as I can tell it fits *exactly* the "philosophy" of ST, which > is to make what the user types, in general, come out as they would > naively expect - I *think* a naive user would expect the above not to be > doing quoting. I'm fine with having things come out exactly as the user expects, as long as we can do so safely.. So even if the user expects:: x * y to come out as an x, an asterisk, and a y, I don't think it should (under the formalism), since that's not *safe*. (any emph region later in the paragraph will seriously confuse things). > I would also like to delimit URLs - it would make life so much simpler. > But I also suspect the STNG people won't agree (of course, we might both > be wrong!). Well, I'll start putting together a case to convince them. > I still don't see why 'x*y>z' *has* to go in literals, > though - clearly by the current and possible future rules it would work > (if we do introduce quoting characters for URLs, I would want to insist > they act with the same sort of rules as literals and Python literals, so > that maths would be no problem). Ick. This makes me cringe. :) You might have noticed, but I want STminus to be "safe", in the sense that there should be no unexpected non-local dependencies. Consider your own sentence, if people think they can leave out the apostrophes:: I still don't see why x*y>z *has* to go in literals, Now, we have a bold "y>z ", and a mysterious '*' after has! Clearly not what we want. (When I say 'x*y>z' *has* to go in the literals, I mean it has to in order to be a "valid" string). > I think it would be a marvellous idea - people already are used to it in > emails, and it makes life simpler all round. Yes, by all means open > talks on this matter on the STNG arena. Will do.. -Edward From edloper@gradient.cis.upenn.edu Thu Mar 22 14:01:48 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 09:01:48 EST Subject: [Doc-SIG] URLs In-Reply-To: Your message of "Thu, 22 Mar 2001 11:31:33 +0100." <3AB9D485.A8FB6F35@lemburg.com> Message-ID: <200103221401.f2ME1np18985@gradient.cis.upenn.edu> > If that's what they want to do, they can use the scheme delimiter (:) > in URLs to make a separation between HTML-Tags and URLs. AFAIK, > the colon is not allowed in HTML-Tagnames (XML is different due to the > namespace notation). Ack. Let's not introduce even more notation, if we can help it! :) (besides which, allowing HTML in ST is very un-safe anyway..) > > So... Where do I go to do my convincing? Do I write a wiki page > > on the Zope site? Or can I write email somewhere? Anyone else > > want to help me convince them? :) > > Sending in patches usually helps ;-) Hm.. Actually, this issue is important enough to me that I'd actually be willing to go read all their code & learn it well enough to put in a patch for this. :) Maybe I'll offer that when I suggest the idea. -Edward From tony@lsl.co.uk Thu Mar 22 14:16:11 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 14:16:11 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103221347.f2MDlgp18142@gradient.cis.upenn.edu> Message-ID: <002101c0b2da$a5537a60$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > Hrm. Ok, currently STminus says that words can contain #inlines#. I believe that to be a mistake. I'll email you a copy of my current REs separately (no need to burden the list) - that may clarify some things (but unfortunately not all, as I don't *always* use URLs) > I would *really* appreciate it if you could either run my test cases > on STpy, or at least read through them.. I've meant to get round to this... Anyway, I've saved sttest.py to floppy, and will take it home today. > Ick. This makes me cringe. :) You might have noticed, Well, it was what I meant by my comment about not expecting you to like something! > but I want > STminus to be "safe", in the sense that there should be no unexpected > non-local dependencies. Consider your own sentence, if people think > they can leave out the apostrophes:: > > I still don't see why x*y>z *has* to go in literals, > > Now, we have a bold "y>z ", and a mysterious '*' after has! Clearly > not what we want. (When I say 'x*y>z' *has* to go in the literals, > I mean it has to in order to be a "valid" string). But by the rules of ST (well, at least of STNG when I looked at it, and I'm sure by my interpretation of the Classic rules), no we don't - we have a bold "has" and a normal font "x*y>z" - the asterisk therein doesn't meet the criteria for starting or ending emphasis. The problem, I guess, is that that seems equally clearly to me how it would (and, indeed, should) work. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 14:37:42 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 14:37:42 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103221335.f2MDZZp17578@gradient.cis.upenn.edu> Message-ID: <002301c0b2dd$a708eb30$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > So, if I understand correctly, this ST:: > > This is a [test] > ..[test] of local references > > Would be rendered in HTML as:: > >

> This is a [test] >

>

> [test] of local references >

> > ? I'm not sure how you'd render it in LaTeX.. Well, I don't trust all that whitespace after your

tags, but apart from that, yes. LaTeX has its own way of doing things. The best one could do, I believe, without some non-negligible effort, would be something like:: This is a test\xref{test} \label{test} test of local references which would use numbers in place of the '{test}' in the actual document (and the "label" is invisible). To use the names, one would need to write more sophisticated code. Which is why it sometimes isn't too worthwhile worrying about renderers. Avid use of '\setcounter' would probably allow one to do something, but on the whole I would either (a) give up and let LaTeX do what it wants, or (b) use TeX and write something myself. > Can anchors appear anywhere in the document? The original intention was for their use *as* footnotey, reference things at the end. Possibly in a "Reference:" clause. But on the other hand, I don't see why they should actually be so restricted. > Do they have to be their own paragraphs? They have to occur at the start of a paragraph. They are markup, though, not structure. > Can anchors be treated as footnotes (e.g., by LaTeX)? I don't know. Probably a presentation issue. > What can their contents be? E.g., can they contain a list item:: Again, I don't know. I wouldn't be *too* upset if we said "just a simple paragraph". > Are we expecting people to *want* to link into a document from > outside? I can't see ever having any use for that when writing > API docs... I don't have a use for it, myself, directly. > > A tool like TeX would need some untangling of the > > '#this' to just 'this' for use in its '\xref', but that's hardly > > difficult. > > Hm.. maybe I just don't know enough LaTeX. :) Well, actually, I think I was misremembering (see above) - it's a while since I've used LaTeX. > I would argue that it would be more appropriate to use:: > > 1. > some text > some more text > Hmm. My original model for the DOM tree was XHTML, and that is not how that works. Doesn't mean my model is a GOOD one, mind you... > Also, what would your "flattening" do with:: > > 1. some text > > some more text > > even more indented It would, erm, flatten the first (list item contains para), and put a block around the second (para contains para - presumably the user had a reason). Consistency, hobgoblins, etc. > Can these nest arbitrarily deeply, if they keep indenting? As above. > > STNG has its own REs. They don't make much sense to me (or > didn't last > > time I looked at them). In some cases, they just didn't > work very well. > > Oh well. > > Well, then, we should convince them to change them! :) I shan't say no... > > But I don't see why *formalising* it is a problem? > > It's just nice to have formalisms that don't contain big > difficult-to-explain regular expressions. It makes the > formalism harder to understand. Oh, big REs make anything harder to understand! > Anything whose meaning is not defined by the formalism is invalid. It > should be possible for a user to ask a tool to tell them if they use > any invalid forms -- that way, they are guaranteed that what they > have written will be interpreted as specified by the *formalism*, > regardless of which implementation/tool they happen to be > using (unless > that tool has a bug). And given a formalism, it is possible to detect > invalid forms in a principled way. Which again comes round to our difference in viewpoint or something - you want to formalise first, and that leads to knowing which documents are invalid. My approach (in this instance, I hasten to add - not in the general, nonST case) is that the user throws their text at STpy (which in practice means an implementation thereof) and sees if it comes out as they expect, with as many warnings to be given as can be if they wish them. The reason for this approach with docutils is mainly that ST doesn't *have* a formalism, and for me the best way of working out what it's meant to be doing has been to work with an implementation. STminus *will* have a formalism, and it may even be a formalism for STpy - but both of those are new things. Of course, I'm also biassed 'cos the Doc-SIG loop tended to fall over at the "formalising the spec" stage, and STpy/docutils was my attempt to short-circuit that - it doesn't look like it'll happen this time (what is it about 2001? - the types sig is active, catalogs are coming, we've got pydoc and soon an ST of our own) Anyway, must go Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Thu Mar 22 16:20:29 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 11:20:29 EST Subject: [Doc-SIG] Re: docutils REs In-Reply-To: Your message of "Thu, 22 Mar 2001 14:16:13 GMT." <002201c0b2da$a6d93000$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103221620.f2MGKTp29623@gradient.cis.upenn.edu> 01> _descriptive = """\ 02> (?P # start our *item* 03> (?: # an unnamed group 04> [^\n]* # 0..n of anything but newline 05> '[^'\n]+' # a literal string, containing 1 or more chars 06> [^\n]* # 0..n of anything but newline 07> )* # end group 08> 09> | # or 10> 11> [^\n]* # 0..n of anything but newline 12> 13> (?! # negative lookahead for 14> ' # a quote 15> [^']* # 0..n of anything but quote 16> 17> [ ]+ -- [ ]+ # spaces -- spaces 18> 19> [^']* # 0..n of anything but quote 20> ' # a quote 21> ) # end of negative lookahead 22> ) # end of our *item* 23> 24> [ ]+ -- [ ]+ # spaces -- spaces 25> 26> (?P .*) # 0..n of any character 27> """ What are lines 11-21 for? The only cases I can think of that they capture (that 3-7 don't) are dubious cases like:: bad 'apostrophe nesting -- in the key Also, I wanted to make sure you're clear that '^' and '$' match beginning and end of LINE, not of STRING (although the latter is a subset of the former). You might want to use '\Z' and '\A' to be more clear (although '\Z' still bothers me a little because it matches both '' and '\n' (but not '\n\n').. seems like it should just match ''). I don't think that STNG currently requires whitespace before *emph* or **strong** etc... that's why I coded it like I did. But I think that STpy's approach may be more reasonable.. (we should start making a list of proposed changes to STNG, in order to make STpy and STNG more compatible.. Otherwise, STminus will just end up being a big useless mess :) ).. Hm.. I guess s/we/I/ in that last parenthetical. :-/ I haven't decided yet on whether I'm happy about having this concept of "acceptable ending punctuation.." It sort of seems like *all* punctuation should be ok, or *none*.. But I'll think on it some more. (e.g., should it be ok to have a dash after an *emph region*-like this?) -Edward From edloper@gradient.cis.upenn.edu Thu Mar 22 16:54:49 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 11:54:49 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Thu, 22 Mar 2001 10:40:20 GMT." <001701c0b2bc$7e2f03a0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103221654.f2MGsnp03119@gradient.cis.upenn.edu> > > I guess that seems reasonable. Within paragraphs, do you collapse > > multiple spaces into one space? > > No, within lines spaces are (very carefully) left untouched, just in > case. That seems inconsistant with stripping trailing whitespace. I want to make sure that things look "the same" whether you view them in text or in html.. So it seems like if you print out a STpy string in vt100 mode, with a 130-character-wide screen, it should collapse spaces & re-word-wrap, just like HTML/LaTeX would.. Of course, you could say that that's just a presentation issue.. But I think that for consistancy, we should either: 1. Preserve spaces within lines *and* trailing whitespace. Specify that display tools (even plaintext ones) should treat spaces as soft, and should word-wrap. 2. Remove leading/trailing whitespace, and collapse sequences of spaces to single spaces. Specify that display tools should word-wrap. (of course, you don't colapse spaces in literal regions, inline regions, literal blocks, or python test blocks.) (unless, of course, you can give me a good reason why we *should* preserve sequences of spaces. > > > > > > "the following is not a url": > > > > > > That's right. In this instance. > > > > So does it get rendered as is (i.e., with two quote signs, one colon > > sign, a less than sign, and a greater than sign)? > > That's up to the renderer. But seriously, it gets *stored* as a node of > the DOM tree which has the text within quotes (i.e., the quotes are not > preserved) as its text, and the URL as its 'url' attribute. Thus the ST > markup (the double quotes and the colon) are not remembered. But "" doesn't match the url pattern, so presumably it doesn't even get detected by the href-finding-regexp? As I understand it, you can say:: "This is a test": of StructuredText and it will be rendered (in HTML) as:: "This is a test": of StructuredText and not as:: This is a test of StructuredText > > The markup-nesting problem doesn't actually seem that difficult to me, > > in principle. I propose that we allow anything to nest > > within anything, > > with the restrictions: > > 1. nothing can nest inside a literal, inline, or href url > > Agreed. But please don't call it an 'href url' - that's an HTML term! > > > 2. nothing can nest within itself (even with intervening levels) > > Pragmatically has to be true, with non-differentiated start and end > quotes. It doesn't *have* to be true.. In principle we could allow:: *This **is *no* good** for me* But I don't think we should. > These two seem to me to be the sane minimum, and thus sensible. So we'll stick with that for now. > > Also, spaces must come between * and ** delimiters, so you > > can't say ***this***. > > Ah, but there's no reason you shouldn't be able to *say **this***, for > instance (it's quite unambiguous). But I thought that regions had to be ended by valid punctuation or space? Does '*' count as valid punctuation, then? (Of course, I expect your regexps to change signifigantly when you try to do nesting...) But from a more abstract point of view, I think that '***' will end up being too confusing. I don't think it's unreasonable to require that people *say **this** *. At the very least, it seems much easier to read (for those who aren't intimately familiar with ST, i.e., our entire user base :) ) But I guess that if we are to allow it, I think '***' should only be allowed to mean "close both strong and emph" or "open both strong and emph".. So you shouldn't be able to say:: *Too***confusing** to mean:: *Too* **confusing** But just to be clear, I don't think we should allow it at all. :) -Edward From edloper@gradient.cis.upenn.edu Fri Mar 23 00:25:28 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 19:25:28 EST Subject: [Doc-SIG] backslashing In-Reply-To: Your message of "Thu, 22 Mar 2001 10:47:57 GMT." <001801c0b2bd$8e347720$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103230025.f2N0PSp06138@gradient.cis.upenn.edu> Tibs said: > Given the attraction of doctest, it seemed sensible to allow its code > blocks to be treated as such. And given its now in the standard Python > package, we'd do well not to ignore it. Agreed. I think that we should reiterate in our own docs that the test cases in the doc strings should be for illustrative purposes, and that extensive unit testing should be put in __test__. Tibs also said: > So whilst doctest does support "out of line" test strings, it will > always be the case that it will run on docstrings as well - by original > intent. > Well, if you ask it nicely, it won't. But in general it should, and certainly will by default. :) Tibs later said (in a different email): > But as you've presented it, that wouldn't naturally be presented as an > interatice session at all - one wouldn't write it as:: > > for example: > > >>> def __repr__(self): > and so on > > but rather as:: > > for example:: > > def __repr__(self): > and so on But then Eddy still wants to know whether the literal block is python code or not (for some of the same reasons that we want to have separate #...# and '...' forms, instead of just one of them). I don't see encoding this information as essential. But if we *do* want to encode it, we have to have some way of distinguishing python literal blocks from vanilla literal blocks (so we'll have 5 different literalish types: literals; inlines; literal blocks; doctest blocks; and python literal blocks). -Edward From edloper@gradient.cis.upenn.edu Fri Mar 23 00:29:28 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 19:29:28 EST Subject: [Doc-SIG] Tokens for labels & endnotes In-Reply-To: Your message of "Thu, 22 Mar 2001 10:54:00 GMT." <001a01c0b2be$67095890$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103230029.f2N0TTp06311@gradient.cis.upenn.edu> Tibs hurt my poor little brain by saying: > For labels I want to exclude '-_', but yes, for labels I want to include > them. I'll assume that the second "labels" should be "local references" (like [this]). > My rather desparate hope (not having read the RE section in the new 2.1 > manuals yet) is that using REs will give good leverage on the problem > mentioned at the top of the email, at which point it *does* become > useful to use '\w'. Yes, I believe a lot of stuff will just fall out nicely with this. -Edward From edloper@gradient.cis.upenn.edu Fri Mar 23 00:38:08 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 19:38:08 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Thu, 22 Mar 2001 11:10:31 GMT." <001c01c0b2c0$b547a410$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103230038.f2N0c8p06743@gradient.cis.upenn.edu> > Edward Welbourne wrote: > > > I think "keep it simple" is required here > > to me that needs to include: > > * case insensitive > > * digits > > Unfortunately, I strongly disagree. I may be *wrong*, but I disagree. I'll assume we're still talking about labels. I think it's unreasonable to expect people to remember whether they're supposed to type "Author:" or "author:".. Is there a good reason to make it case sensitive? I can't imagine ever defining two *different* labels "Author" and "author"... Convince us that it should be case sensitive. :) -Edward From edloper@gradient.cis.upenn.edu Fri Mar 23 00:45:31 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 19:45:31 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Thu, 22 Mar 2001 14:16:11 GMT." <002101c0b2da$a5537a60$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103230045.f2N0jVp07081@gradient.cis.upenn.edu> > > STminus to be "safe", in the sense that there should be no unexpected > > non-local dependencies. Consider your own sentence, if people think > > they can leave out the apostrophes:: > > > > I still don't see why x*y>z *has* to go in literals, > > > > Now, we have a bold "y>z ", and a mysterious '*' after has! Clearly > > not what we want. (When I say 'x*y>z' *has* to go in the literals, > > I mean it has to in order to be a "valid" string). > > But by the rules of ST (well, at least of STNG when I looked at it, and > I'm sure by my interpretation of the Classic rules), no we don't - we > have a bold "has" and a normal font "x*y>z" - the asterisk therein > doesn't meet the criteria for starting or ending emphasis. The problem, > I guess, is that that seems equally clearly to me how it would (and, > indeed, should) work. Ok, so I was using different rules than you were (I was using STNGs).. So the relevant example would be:: I still don't see why x * y *has* to go in literals. I admit, that's a little more strained. But I still think there's a safety issue here.. (although there is something to be said about having "'" use the same rules as all the other delimiters (or vice versa, I guess)). I'll think on it some more. Somewhat related, do you think we should allow things like:: *two*'regions' Where 2 regions are not separated by space/punctuation? I vote no.. -Edward From edloper@gradient.cis.upenn.edu Fri Mar 23 01:00:32 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 20:00:32 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Thu, 22 Mar 2001 14:37:42 GMT." <002301c0b2dd$a708eb30$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103230100.f2N10Wp07821@gradient.cis.upenn.edu> Tibs said: > > Can anchors appear anywhere in the document? > > The original intention was for their use *as* footnotey, reference > things at the end. Possibly in a "Reference:" clause. But on the other > hand, I don't see why they should actually be so restricted. I vote for only allowing them at the end, because they confuse me otherwise. :) What do other people think? > > Do they have to be their own paragraphs? > > They have to occur at the start of a paragraph. They are markup, though, > not structure. Um. I'm not sure exactly how you're using those two terms. You mean they're what I would call "local formatting" or "coloring" and not "global formatting" or "structring"? If so, I disagree. I think they should be a special type of heading, very similar syntactically to labels. So you can say:: ..[foo] bar or:: ..[foo] para 1 para 2 > > I would argue that it would be more appropriate to use:: > > > > 1. > > some text > > some more text > > > > Hmm. My original model for the DOM tree was XHTML, and that is not how > that works. Doesn't mean my model is a GOOD one, mind you... Hm. I'd rather use a good model. :) But how we convert it to an XML document isn't really a fundamental issue, so I'll just leave it be for now.. > Oh, big REs make anything harder to understand! Yes, but I think that more work needs to go into making formalisms easy to understand than implementations.. > Which again comes round to our difference in viewpoint or something - > you want to formalise first, and that leads to knowing which documents > are invalid. My approach (in this instance, I hasten to add - not in the > general, nonST case) is that the user throws their text at STpy (which > in practice means an implementation thereof) and sees if it comes out as > they expect, with as many warnings to be given as can be if they wish > them. The problem I have with your approach is that it assumes: 1. There is one cannonical tool, or all tools work exactly the same. 2. The tools won't change over time I think that we may be setting ourselves up for annoying problems down the road, in terms of people wanting backwards compatibility so they won't have to rewrite doc strings. Witness how much of a problem backwards compatibility can be for Python in introducing things like nested scopes.. Other people *have* successfully defined (formalized) documentation languages (javadoc, pod), so I don't see why we can't do the same, in principle.. > The reason for this approach with docutils is mainly that ST doesn't > *have* a formalism, and for me the best way of working out what it's > meant to be doing has been to work with an implementation. Which is reasonable. But I think that you should be at least working *towards* a formalism.. > Of course, I'm also biassed 'cos the Doc-SIG loop tended to fall over at > the "formalising the spec" stage, and STpy/docutils was my attempt to > short-circuit that - it doesn't look like it'll happen this time (what > is it about 2001? - the types sig is active, catalogs are coming, we've > got pydoc and soon an ST of our own) Well, hopefully this time we'll manage to stay standing. :) -Edward From Edward Welbourne Thu Mar 22 20:36:19 2001 From: Edward Welbourne (Edward Welbourne) Date: Thu, 22 Mar 2001 20:36:19 +0000 (GMT) Subject: [Doc-SIG] What docs should be in the source file? In-Reply-To: <200103211926.f2LJQDp15504@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103211926.f2LJQDp15504@gradient.cis.upenn.edu> Message-ID: Ping: > What i had in mind was "how to use this module". I'll come back to this: it points to a middle ground ... > As an extreme example, try running "perldoc CGI". CGI.pm contains > about 3200 lines of code followed by 3000 lines of detailed > documentation. While the module itself is indeed enormous, i think > that it is useful to have all of that information about how to use the > CGI module instantly available right there in CGI.pm. having played enough with pod to see that it would be a good tool for the kinds of in-code doc I discuss below, *but no more*, I would argue that pod is actually the perfect illustration of why I *don't* want to go this far (even allowing for CGI.pm to be chopped into little pieces). The docs one gets out of pod are just about OK for showing to techies who only really care about information content and don't mind *too much* if it's rather badly presented - as various remarks in the perlpod man page will make clear, this is intended. Using it for anything more leads to ugly docs (I shalln't be quick to forget the look of *disgust* on the face of a technical author colleague when I asked, yesterday, what to do about exactly some such docs ... I'm glad pod was the focus of that, else I'd have curled up and died). Ping: > ... The question is "how far is the user of a module from *some* > information on how to use the module?" Doesn't matter if they don't > have every article that anyone has ever written about the module -- do > they have a starting point? Edward & Tibs have clearly been devoting *much* effort to consideration of how to embed xrefs in the doc strings (and yes, <...> is the morally correct way to delimit URLs; but I'll come back to that in a separate e-mail), so - at least in principle - the doc string contains xrefs to all the flavours of doc that might exist in connection with the module. Even if it only documents the naked API, its xrefs are a starting point. Ping: > It's also harder for me to change foo() to spam() in just the code, > check in just that part, and say "oh, i'll change the docs later" -- > because i'll be checking in a single file that's inconsistent with > itself. Edward: > If the docs are in a different file, I can do a CVS diff to see what's > changed in the code since the last time I updated the docs, and thus > can do updates to the documentation "in batch." and, when I know several of my colleagues will be changing the same file some more in the present release cycle, batching the doc changes may well be The Right Thing To Do - especially if, as where I work, there's a separate documentation group ... and I trust their idea of what constitutes an intelligible presentation of `how to use this tool' better than I trust most techies, myself included. So TRTTD may well be to send an e-mail to the doc group saying `I changed module foo in this way, I *have* revised the API docs within it, I think you need to change sections 2, 7 and 11 of the refman, along with all references, in all docs, to method fudge() on class Interpolator', rather than messing up their docs (which are likely maintained in some other doc format anyway, precisely because real doc teams don't believe in the sorts of doc-tool that techies think of as the bee's knees). Furthermore, the changes to documentation, even if I draft them before the doc team goes to work, will probably need to be integrated with several other sets of changes made by colleagues whose projects impact the same source file in the same release cycle. I may need to check in my code changes to get the automatic test tools to run the right tests on all platforms (as opposed to the one or two on which I test it myself before freezing) and I may well be checking in a prototype or first draft of my changes in order to find out which irritating platform-variations are going to force me to revise my approach before I can settle the final issues of the design that goes into the release candidate; so it may not be `laziness' that I leave out my changes to the docs - it may be the prudence of `I shall almost certainly be changing this some more, and shall not know for sure how until later' which makes large amounts of efforts on the docs futile. So `oh leave the docs for now' may actually be wise and prudent; and, as Edward says, I (or our doc team) can ask tools where changes to docs are needed. Indeed, in an ideal world, the doc team has taken the design spec I wrote before I began coding and is working on the user-oriented docs at the same time that I'm changing the code. It is generally best, under *any* version-control system, to avoid having two sets of changes proceeding on the same file at the same time; and I'll be reviewing the doc team's work while the doc team review my changes to the API docs, so we do get to catch glitches. Now, back to Ping's > What i had in mind was "how to use this module". and here I'm with Ping, regarding Edward's `Only the API' line as being too purist - or confusingly phrased; I shalln't be surprised if Edward's idea of `Only the API' does include `how to use this API', so I suspect we aren't as far apart as we seem to imagine. So I have half a guess that the following might bring us closer to agreement: The python source code contains doc strings which explain how to use the code; this is expressed in ST and targeted at maintainers and interrogators - i.e. folk who are either looking at the source or playing with an object their python session has given them, whose behaviour they need to know about, ideally without being obliged to look at the implementation (even assuming they have it). Other files contain documentation of other kinds, possibly in other formats; project management and version control can be used to flag which of these will need to be changed when the code changes. The source docs cross-reference these. The source docs *should* suffice to generate (possibly crude) documentation in (at least) man and HTML formats, which should be of a good enough standard to serve as the *start-point* for writing the reference manual; indeed, if one isn't too fussed about the reference manual being beautiful, they should suffice *as* a reference manual. The source doc format *must* be sufficiently straightforward that a maintainer looking at the code *will* read and understand them without suffering eye-pain (on which HTML fails for Guido at least) a maintainer changing the code *will* be able to see what changes to make to the docs and *will not* be put off making those changes by doubts about how to express them an interrogator with a python object `in hand' can (chose their own interrogation tools and, using these) get the object to tell all they need to know to determine what it promises to do (and what it doesn't) the author of potential client code can ask tools to find them which source modules to consider using and can glean enough information from the docs of those modules to make informed (ideally: correct) choices. The maintainer's needs call for simplicity of format, the interrogator's call for richness, albeit with some cross-over both ways; good tools can make a big difference to the richness (e.g. all that stuff about trawling base classes for matching methods, providing default doc strings, etc.). The client-author's needs call for standardisation (hence Tibs' work on labels). Practical experience in the field of software maintenance says unambiguously that simplicity is a very serious issue, especially if one is to have enough standardised semantic markup to ensure that tools can do a good job for the client author. A surfeit of bureaucracy *will* lead to folk changing the code without bothering to keep the source docs in sync (let alone the out-of-source ones). Equally, without suitable standardised markup, client-authors will be unable to find a good supplier of round wheels, so they *will* end up using hexagonal ones `because those are easier to knock together', which will continue to make a mess of the roads. Case in point: regexen for URLs. To meet these needs, the source docs for each method/class/... (call it: object) *do* need to include: * a clear statement of what the object *promises* (and doesn't) * a clear statement of *what it's for* and *how to use it* * references to more sophisticated docs saying everything else for as many values of `everything else' as authors can be found to write. If we ask for more than this, * we'll need such complexity in ST that maintainers won't, so * it won't be realistic to expect the in-code docs to stay in sync with the code they're in, so * we won't have the code separate from the docs that will get out of sync with it, so no-one will know which is right. Note that disagreement between code and some docs won't trigger the `trust neither' rule provided * it's immediately clear to the reader which one (the one in a different file from the implementation) is wrong, and * there are *some* docs with the code which agree with it. Ping: > - Keeping modules and associated docs in the same file helps > to ensure that the two are in sync when you distribute or > edit the file. (It's not possible to have different > versions of the code and the docs at the same time; it's > less likely that someone will check in changes to one > without updating the other, etc.) Edward: > 2 issues: editing and distribution > distribution -- maybe we want to turn modules into packages, and > include docs in the package? There's not a lot of precedent > for this in other languages though.. > editing -- ... (I already addressed editing) The distribution problem defines the boundary quite nicely: reference manuals, how-to guides and tutorials *shouldn't* change when I fix a bug, though the in-code docs might (notably for the internal method which now has to do things slightly differently so that the module actually implements its documented external API). Likewise if I totally re-implement the entire module, but preserve its API; conversely, a perfectly good module may get its reference manual massively overhauled without changing one line of the code. Of course, a real total re-implementation will change the API, but then it'll equally be part of a `new major release' of the module, so re-writing the separate docs shouldn't seem out of place. (Indeed, a total re-write of the module reference manual will typically reveal changes needed in the API.) Furthermore, if you've got the code you need the API and an overview of how to use the module; these need to be in sync with the actual implementation you've got (and to tell you which *version* you've got). However, you'll probably only use a moderate fraction of the actual modules in your python installation, so you probably *don't* want a separate copy of the tutorial and similar `big picture' docs on every machine on which you install your python distribution; you may be happy to live with the xrefs pointing to www.python.org or you may want to have one copy of all the big picture docs on a central server shared by all pythoneers in a given team. [Which points to an issue for the URL discussion; one really does want to be able to specify URLs relative to `the root URL we selected when we installed our python doc system' which *might* be at www.python.org and *might* be on a machine on the team's local network or *might* be local to the actual machine in use; the installation process will doubtless involve verifying that this URL is accessible and *does* provide the relevant docs.] When it comes to the reference manual, there is even a case for deliberately chosing to isolate it from the source - so that, for instance, I can implement a module which will be portable between versions of python. If, in it, I rely on the in-code docs of my locally-installed re module (say) I may well write code which only works for folk using the same version of python as me. The reference manual for module re *should* tell me gotchas about `we changed this between version 1.5.2 and 2.0 of python, so beware' which (IMO) *should not* be present in the in-code docs of the module. So I find myself increasingly confident that TRTTD is to draw a dotted line between the things which belong with the code and the things which do not; that all `big picture' docs belong in separate files; that authors of client code really *do* need to use these `big picture' docs as their primary source (for portability); and that the in-code docs should be limited to the API and an account of its proper usage. The big picture docs then get to be revised when the API changes, or when someone finds the energy to improve them, as a separate process from any changes to the code. Eddy. From tony@lsl.co.uk Fri Mar 23 10:22:59 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 10:22:59 -0000 Subject: [Doc-SIG] Re: docutils REs In-Reply-To: <200103221620.f2MGKTp29623@gradient.cis.upenn.edu> Message-ID: <002a01c0b383$3c18a670$f05aa8c0@lslp7o.int.lsl.co.uk> > 01> _descriptive = """\ > 02> (?P # start our *item* > 03> (?: # an unnamed group > 04> [^\n]* # 0..n of anything but newline > 05> '[^'\n]+' # a literal string, containing 1 or more chars > 06> [^\n]* # 0..n of anything but newline > 07> )* # end group > 08> > 09> | # or > 10> > 11> [^\n]* # 0..n of anything but newline > 12> > 13> (?! # negative lookahead for > 14> ' # a quote > 15> [^']* # 0..n of anything but quote > 16> > 17> [ ]+ -- [ ]+ # spaces -- spaces > 18> > 19> [^']* # 0..n of anything but quote > 20> ' # a quote > 21> ) # end of negative lookahead > 22> ) # end of our *item* > 23> > 24> [ ]+ -- [ ]+ # spaces -- spaces > 25> > 26> (?P .*) # 0..n of any character > 27> """ > > What are lines 11-21 for? The only cases I can think of that > they capture (that 3-7 don't) are dubious cases like:: > > bad 'apostrophe nesting -- in the key I can't offhand remember - the RE growed until it appeared to work, and some of it appeared to rely on the "fuzzy" handline that REs appear (to me) to do in balancing the greediness of different bits of the RE. It's possible it's skeletal remains which should be excised, I suppose. > Also, I wanted to make sure you're clear that '^' and '$' match > beginning and end of LINE, not of STRING (although the latter is > a subset of the former). Not according to the RE documentation in the Python 1.5.2 reference manual, they don't - that's quite clear in saying start and end of STRING, and recognition of newlines is only in MULTILINE mode. > I don't think that STNG currently requires whitespace before > *emph* or **strong** etc... that's why I coded it like I did. I kept STNG REs around as comments for "of interest" reasons, but personally found them less than useful, so basically have worked from scratch and the ST "documentation". So it's quite possible they're different. > But I think that STpy's approach may be more reasonable.. > (we should start making a list of proposed changes to STNG, > in order to make STpy and STNG more compatible.. Otherwise, > STminus will just end up being a big useless mess :) ).. Well, no, I wouldn't say that. > Hm.. I guess s/we/I/ in that last parenthetical. :-/ my preferred option! > I haven't decided yet on whether I'm happy about having this > concept of "acceptable ending punctuation.." It sort of seems > like *all* punctuation should be ok, or *none*.. I'm not *too* happy about it myself, and actually it's a string that's '%' included into the RE texts where it's needed - this means that (a) it's easy to change, but (b) it should be the same in all places - I thought consistency was a Good Idea. > But I'll > think on it some more. (e.g., should it be ok to have a dash after > an *emph region*-like this?) That looks wrong to me - but then you can see how I use dashes in plain text! There *are* some conventions on how one uses punctuation - for instance, 'this ,' looks wrong to almost everyone. ST just enforces some of them (this is, of course, yet another class of things to consider warning people about). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 23 10:34:29 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 10:34:29 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103221654.f2MGsnp03119@gradient.cis.upenn.edu> Message-ID: <002b01c0b384$d7153480$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > > No, within lines spaces are (very carefully) left untouched, just in > > case. > > That seems inconsistant with stripping trailing whitespace.> Actually, I agree, although I hadn't thought of it until you said so > I want > to make sure that things look "the same" whether you view them in > text or in html.. So it seems like if you print out a STpy string > in vt100 mode, with a 130-character-wide screen, it should collapse > spaces & re-word-wrap, just like HTML/LaTeX would.. Of course, you > could say that that's just a presentation issue.. But I think that > for consistancy, we should either: > > 1. Preserve spaces within lines *and* trailing whitespace. > Specify that display tools (even plaintext ones) should treat > spaces as soft, and should word-wrap. I remember when most tools would show trailing whitespace visibly. Those days appear to have gone a long time ago. I'd oppose retaining trailing whitespace. > 2. Remove leading/trailing whitespace, and collapse sequences > of spaces to single spaces. Specify that display tools > should word-wrap. (of course, you don't colapse spaces in > literal regions, inline regions, literal blocks, or python > test blocks.) Word wrapping is a presentation issue - if the renderer is generating etext, or STNG, then it may make sense to *not* word wrap. > (unless, of course, you can give me a good reason why we *should* > preserve sequences of spaces. No. It's only laziness. So far as I can see, it doesn't actually make any different in any circumstance I can see, so there's no point in my bothering to remove the spaces internally - I don't care about them, whereas I *have* to care about leading spaces, and I remove trailing spaces out of kindness (so that '::' works better, for instance - it doesn't suffer from the "oh dear, that backslash didn't continue my Python line because there's an invisible space after it" problem). > > > > > > > "the following is not a url": > > > > > > > > That's right. In this instance. > > > > > > So does it get rendered as is (i.e., with two quote > signs, one colon > > > sign, a less than sign, and a greater than sign)? > > > > That's up to the renderer. > > But "" doesn't match the url pattern, so presumably it doesn't > even get detected by the href-finding-regexp? As I understand it, > you can say:: Sorry - I slipped into "if were a meta-rendition of a URL" mode. You're right, it would be stored "as is". > > > 2. nothing can nest within itself (even with intervening levels) > > > > Pragmatically has to be true, with non-differentiated start and end > > quotes. > > It doesn't *have* to be true.. In principle we could allow:: > > *This **is *no* good** for me* I suppose so, for one definition of how one would parse it (he said grudgingly). > But I don't think we should. Luckily we agree! > > These two seem to me to be the sane minimum, and thus sensible. > > So we'll stick with that for now. > > > > Also, spaces must come between * and ** delimiters, so you > > > can't say ***this***. > > > > Ah, but there's no reason you shouldn't be able to *say > **this***, for > > instance (it's quite unambiguous). > > But I thought that regions had to be ended by valid punctuation or > space? Does '*' count as valid punctuation, then? (Of course, > I expect your regexps to change signifigantly when you try to do > nesting...) Hmm. Well, it works: text: **SS*ee*** --> rendering SS*ee* Hmm. > But from a more abstract point of view, I think that '***' will end > up being too confusing. I don't think it's unreasonable to require > that people *say **this** *. At the very least, it seems much > easier to read (for those who aren't intimately familiar with ST, > i.e., our entire user base :) ) But I've already noted you have a cavalier attitude to extra spaces (pace your HTML) - and I'm not convinced on the "easier to read". Ho hum. > But I guess that if we are to allow it, I think '***' should only > be allowed to mean "close both strong and emph" or "open both > strong and emph".. So you shouldn't be able to say:: > > *Too***confusing** I just tried it - docutils does: text: *ee***SS** --> ee***SS* which seems reasonable enough. > to mean:: > > *Too* **confusing** The latter is clearer, certainly! > But just to be clear, I don't think we should allow it at all. :) I *think* it will all come out in the wash, myself. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 23 10:39:13 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 10:39:13 -0000 Subject: [Doc-SIG] backslashing In-Reply-To: <200103230025.f2N0PSp06138@gradient.cis.upenn.edu> Message-ID: <002c01c0b385$80d1a8a0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > Agreed. I think that we should reiterate in our own docs that > the test cases in the doc strings should be for illustrative > purposes, and that extensive unit testing should be put in > __test__. Well, I think we should say what they *are*, which is that they are strings that represent Python code that doctest will happily find (modulo the place the docstring is) and process - that's *not* the same as "illustrative", which implies "not very important" - the point is that they may well be pedagogic... On the pursuit of extensive unit testing and where strings shoul go, I think we should be silent, and leave it to doctest and the unit test software (whichever it is - pyunit?) to say... > But then Eddy still wants to know whether the literal block is python > code or not (for some of the same reasons that we want to have > separate #...# and '...' forms, instead of just one of them). > > I don't see encoding this information as essential. But if we *do* > want to encode it, we have to have some way of distinguishing > python literal blocks from vanilla literal blocks (so we'll have > 5 different literalish types: literals; inlines; literal blocks; > doctest blocks; and python literal blocks). That way lies madness, 'cos what about C code, oh, and maybe some Haskell is very important, and... I think this is too big a task for ST itself - maybe a later job for @ escapes (ducks and covers). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 23 10:41:49 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 10:41:49 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103230038.f2N0c8p06743@gradient.cis.upenn.edu> Message-ID: <002d01c0b385$dd989300$f05aa8c0@lslp7o.int.lsl.co.uk> I wrote: > > Unfortunately, I strongly disagree. I may be *wrong*, but I > disagree. Edward D. Loper wrote: > I'll assume we're still talking about labels. yes > I think it's unreasonable to expect people to remember > whether they're supposed to type "Author:" or "author:".. > Is there a good reason to make it case sensitive? I can't > imagine ever defining two *different* labels "Author" and > "author"... Convince us that it should be case sensitive. :) Unfortunately, I remember too many arguments where I start out disagreeing vehemently and end up having to give in for sake of "reasonableness" (cursed are they who see both sides of the argument...). My argument for case sensitivity is that "it damn well should be like that" (all those careless programmers, mutter, mutter). Which probably means I'll need to give in. Oh well. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 23 10:49:25 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 10:49:25 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103230045.f2N0jVp07081@gradient.cis.upenn.edu> Message-ID: <000001c0b386$ed3dbe10$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > So the relevant example would be:: > > I still don't see why x * y *has* to go in literals. I agree that's a problem paragraph. > Somewhat related, do you think we should allow things like:: > > *two*'regions' > > Where 2 regions are not separated by space/punctuation? I vote > no.. I vote no too (and so does docutils - or at least, it votes not to recognise that text as being marked up). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 23 11:04:04 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 11:04:04 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103230100.f2N10Wp07821@gradient.cis.upenn.edu> Message-ID: <000101c0b388$f9778e20$f05aa8c0@lslp7o.int.lsl.co.uk> > Tibs said: > > > Can anchors appear anywhere in the document? Edward Loper wrote: > I vote for only allowing them at the end, because they confuse > me otherwise. :) What do other people think? Well, I know David Goodger was trying to invent something *like* these anchors to allow him to navigate round a document. I suspect it's something I'm agnostic on, and would thus not choose to pronounce on... > > > Do they have to be their own paragraphs? > > > > They have to occur at the start of a paragraph. They are > markup, though, not structure. > > Um. I'm not sure exactly how you're using those two terms. You mean > they're what I would call "local formatting" or "coloring" and not > "global formatting" or "structring"? Yes. But I agree it's a dodgy issue distinguishing a paragraph that has an item that can only occur at the start from an element that is "named" by that occurrence. It may well be they *should* be more structurey, if they are required so to act (and that would also make it easier to allow them to start a new paragraph, which we *might* want to do). > > Hmm. My original model for the DOM tree was XHTML, and that > > is not how that works. Doesn't mean my model is a GOOD one, mind you... > > Hm. I'd rather use a good model. :) But how we convert it to an > XML document isn't really a fundamental issue, so I'll just leave > it be for now.. XML isn't, but the choice of DOM is - the reason for choosing DOM was so that an interface between parser and user could be established that was well understood, and could be maniplulated easily with Python available tools. The "magic" behind the DOM creation could then be done by any of a series of tools, provided they all produced the same sort of DOM tree (i.e., use the same or similar DTD). I thus see STpy as mapping directly to a DTD (or XML-Schema, or name your poison). Given DOM, XHTML seems a natural example to choose (although I do find it a bit odd in places). The DOM thing is an important point... (I'll have to make sure the PEP stresses that). > The problem I have with your approach is that it assumes: > 1. There is one cannonical tool, or all tools work exactly > the same. > 2. The tools won't change over time Which is what the DOM thing is partly meant to address - but I see you are talking about a different bit of the "approach". > I think that we may be setting ourselves up for annoying problems > down the road, in terms of people wanting backwards compatibility > so they won't have to rewrite doc strings. Witness how much of a > problem backwards compatibility can be for Python in introducing > things like nested scopes.. > > Other people *have* successfully defined (formalized) documentation > languages (javadoc, pod), so I don't see why we can't do the same, > in principle.. No, our problem is that ST *is* defined, but informally. Retrofitting a formalism onto an informal standard *is* a problem, 'cos people have different understandings of how that informal standard will work. As such, docutils takes the "code it and see how it works" approach (Python as formalism), whilst you're taking the "think about it hard and see what it should do" approach (more traditional formalism). But we're both stuck with the format *essentially* already being defined for us. > > The reason for this approach with docutils is mainly that ST doesn't > > *have* a formalism, and for me the best way of working out what it's > > meant to be doing has been to work with an implementation. > > Which is reasonable. But I think that you should be at least working > *towards* a formalism.. Well, give me a chance to finish writing the documentation! Seriously, STpy *is* defined, but it is done in less formal language than EBNF. docutils has been used to inform me as to what sensible decisions and behaviour might be, but it is not the definition. I assume that someone could take STpy and produce EBNF from it. I hope. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 23 11:24:32 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 11:24:32 -0000 Subject: [Doc-SIG] What docs should be in the source file? In-Reply-To: Message-ID: <000601c0b38b$d55c83d0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Welbourne wrote a long missive on the subject, to which I add: what he said. Being me, I can't refrain from a couple of comments: > The maintainer's needs call for simplicity of format, the > interrogator's call for richness, albeit with some cross-over > both ways; good tools can make a big difference to the richness > (e.g. all that stuff about trawling base classes for matching > methods, providing default doc strings, etc.). The > client-author's needs call for standardisation (hence Tibs' work > on labels). (although I may be working myself towards an argument against them, on the "simplicity" stance - we'll see) > Practical experience in the field of software maintenance says > unambiguously that simplicity is a very serious issue, > especially if one is to have enough standardised semantic > markup to ensure that tools can do a good job for the client > author. And this, of course, is why Edward Loper and I are having such a long discursion on the SIG, and particularly why Edward Loper keeps pushing for more formalism and less complexity - he rightly worries that too much complexity will make our markup too difficult to use, and too ambiguous to work with, whilst I fret about users typing things that they feel *should* work... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 23 11:39:27 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 11:39:27 -0000 Subject: [Doc-SIG] Terminology (was RE: formalizing Structured Text) In-Reply-To: <200103221333.f2MDXBp17465@gradient.cis.upenn.edu> Message-ID: <000701c0b38d$ea90e960$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > I wanted to make sure my terminology was clear, because it looked > like the indentation got messed up somehow. My terms are: > * valid > * invalid > * illegal > * undefined Ah - that makes more sense. I wondered if that had happened. In that case, I agree, and will try to conform myself. Now - earlier on you called me to task on my naming of paragraphs and so on (correctly so). I started to work up a list of "common terms", so we could reach agreement (I surely need better terms, as we saw) but ran out of time. Could I ask you to run something together to float on the list? I think (from inaccurate memory) we have something like: text block -- one of the many sorts of paragraph paragraph -- a "vanilla" text block list item -- a text block that starts a list item Python block -- a literal text block introduced by '>>>' literal block -- a literal text block introduced by '::', may contain blank lines markup -- the result of colourising literal string -- what goes within single quotes Python string -- what goes within '#..#' emphasised text -- what goes within '*..*' strong text -- what goes within '**..**' hmm - shift between 'text' and 'string' is clumsy, but may be justified - they *are* strings, sort of. URL -- shorthand (inaccurate) for a URI quoted string -- something in '".."' paragraph label -- we must have a better name for this anchor -- a '..[anchor]' thingy - maybe we actually have an "anchor block" as well localref -- or "local reference" - refers to an anchor, looks like '[this]' Please criticise! Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Fri Mar 23 13:38:59 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 23 Mar 2001 08:38:59 EST Subject: [Doc-SIG] ST and DOM Message-ID: <200103231339.f2NDd0p12208@gradient.cis.upenn.edu> So I was just looking through the XHTML DTD, and it doesn't really seem like what we want. But Tib's points about the DTD representation being important as a well-defined interface to ST are well-taken.. Thus, I'd like to hash out some of the involved issues so I can put the appropriate stuff in my PEP. :) For now, I want to *only* consider global formatting. We'll get to local formatting (=colorising) later. :) There are 2 basic types of global formatting element: basic elements (which are atomic, as far as global formatting goes); and hierarchical elements (which are not). I really think that the DOM tree should capture the *structure* of the formatted string.. To me, that means that it's weird to have elements like define a list item to be "a text block that *starts* a list item"... Anyway, I propose that we use something similar to the following scheme: Basic units:: Hierarchical units:: Some notes on this scheme.. Some of these might end up getting changed.. * labelsection can only appear at top-level * anchorsection can only appear at top-level, and after all other elements of structuredtext. * list items may not contain sections; but they can contain just about anything else (except top-level-only things). * anchor sections may not contain sections; but they can contain just about anything else (except top-level-only things). * labelsections can contain anything except top-level-only things. However, particular labels may place further restrictions on their contents.. Now, this is not meant to be a final DTD.. For example, it might make sense to split list, listitem, and bullet into 3: dlist, olist, ulist, etc.. But does this *overall* structure seem reasonable? For comparison, Tibs has a DTD at the bottom of , although I'm not sure if it's up-to-date. It seems to go against some of the things he's been saying on doc-sig lately.. (??). -Edward From tony@lsl.co.uk Fri Mar 23 14:21:22 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 14:21:22 -0000 Subject: [Doc-SIG] ST and DOM In-Reply-To: <200103231339.f2NDd0p12208@gradient.cis.upenn.edu> Message-ID: <000d01c0b3a4$89730980$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > So I was just looking through the XHTML DTD, and it doesn't really > seem like what we want. It is a bit odd, isn't it. > But Tib's points about the DTD representation > being important as a well-defined interface to ST are well-taken.. Good. > Thus, I'd like to hash out some of the involved issues so I can > put the appropriate stuff in my PEP. :) I think that we should agree to agree on a DTD - that has advantage for us in that we can both use the knowledge gained/shared, and it has *definite* advantages for (a) people deciding which PEP they want (if not both) and (b) tool users trying to take advantage of either/both of our packages. We might even get STNG to agree... Is this actually a separate PEP altogether? ("Doc-SIG - the PEP producer") > For now, I want to *only* consider global formatting. We'll get to > local formatting (=colorising) later. :) Reasonable. So we're defining "text blocks" and the structure above them. (for those who don't know it, the major oddity of the XHTML DTD is that it *doesn't* draw this distinction, so one gets the strange sort of concept of: contains: <#text node> which is *distinctly* odd to someone trying to work with a non-XML document, and is one (although not the major) reason why I made my internal datastructure non-DOM). > There are 2 basic types of global formatting element: basic > elements (which are atomic, as far as global formatting goes); > and hierarchical elements (which are not). OK - that's how I normally think too. But that distinction comes for free with using a DTD, really. > I really think that the DOM tree should capture the *structure* of > the formatted string.. To me, that means that it's weird to have > elements like define a list item to be "a text block that *starts* > a list item"... Anyway, I propose that we use something similar to > the following scheme: Agreed. Some additional elements are needed for callable object docstrings, though - informally, one also needs the "funcdesc" (apologies for the poor name) which is made up of a "signature" and an optional "summary-descripton" - for instance:: function(fred[,boolean]) -> integer -- This is silly. or function(fred[,boolean]) -> boolean This is silly. (the two examples are identical in "meaning"). This is *important* for docstrings, and should not be forgotten now if we are tailoring a solution for such. Maybe they should be "callable", "callable_signature", "callable_summary" (or maybe one can elide the "callable" on the sub-elements. The following is probably wrong (and the names are too long!): > Basic units:: > > > > > > > > > Hierarchical units:: > > literalblock | doctestblock | > labelsection)*, > anchorsection*)> > (section | paragraph | list | > literalblock | doctestblock)+)> > > (paragraph | list | > literalblock | doctestblock)*)> > (paragraph | list | > literalblock | doctestblock)*)> > (section | paragraph | list > literalblock | doctestblock)+)> > > Some notes on this scheme.. Some of these might end up getting > changed.. > * labelsection can only appear at top-level Needs debating - I don't necessarily disagree, though. > * anchorsection can only appear at top-level, and after all > other elements of structuredtext. I probably disagree. Probably. > * list items may not contain sections; but they can contain > just about anything else (except top-level-only things). I *do* agree (I too dislike sections in list items!) > * anchor sections may not contain sections; but they can > contain just about anything else (except top-level-only > things). Makes sense. > * labelsections can contain anything except top-level-only > things. However, particular labels may place further > restrictions on their contents.. Agreed. I would personally prefer to lose "bullet" as such, and retain only "key" or "description" for descriptive lists. I do not wish the renderer to take the bullet (or number sequence) as anything other than a hint, and thus I think it should be an attribute, not an element... Also to be reserved for future consideration: it seems natural to me to build a DOM tree that represents the whole module or package that is being dealt with, and "blat it out" in one go to the final format. This allows one to handle cross-referencing within a package (validate it, that is), rearrange the tree *as a whole*, and so on. So we will also want (optional) infrastructure *above* what you have defined. I would propose that we have a toplevel node called something like "document" (heh, its traditional), and appropriate nodes allowed below that called "module", "function", "class" and "method", with other appropriate nodes and attributes for storing the useful information one might want to cache thereon. This is how docutils currently works (well, more or less). But as I said, for future consideration. > Now, this is not meant to be a final DTD.. For example, it might > make sense to split list, listitem, and bullet into 3: dlist, olist, > ulist, etc.. But does this *overall* structure seem reasonable? I think it probably does make such sense (I'd prefer it that way). But I agree, it's a good start. Do we have anyone around, listening, who actually knows how one is *meant* to design a *good* DTD (i.e., I'm sure we can come up with something workable, but are there conventions, known boobytraps, etc., that we can be helped with to get something really good?) > For comparison, Tibs has a DTD at the bottom of > , > although I'm not sure if it's up-to-date. It seems to go against > some of the things he's been saying on doc-sig lately.. (??). It's very old, it was very preliminary, and it's just plain wrong. So ignore it. (main task this weekend: rewrite STpy.html possibly to be preempted by all the "real life" things I also have to do...) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Which is safer, driving or cycling? Cycling - it's harder to kill people with a bike... My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Fri Mar 23 14:28:14 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 23 Mar 2001 09:28:14 EST Subject: [Doc-SIG] Re: docutils REs In-Reply-To: Your message of "Fri, 23 Mar 2001 10:22:59 GMT." <002a01c0b383$3c18a670$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103231428.f2NESEp14932@gradient.cis.upenn.edu> [question about big long re for descr list items] > I can't offhand remember - the RE growed until it appeared to work, and > some of it appeared to rely on the "fuzzy" handline that REs appear (to > me) to do in balancing the greediness of different bits of the RE. It's > possible it's skeletal remains which should be excised, I suppose. Hm.. I bet that's how the STNG REs got where they are today! ;) If you get a chance, could you try taking those lines out, and see if it still passes your test cases? > Not according to the RE documentation in the Python 1.5.2 reference > manual, they don't - that's quite clear in saying start and end of > STRING, and recognition of newlines is only in MULTILINE mode. Hm. You're right. I was confused. I wonder why I was. Oh well, I still don't like the fact that '$' matches '\n'. >>> re.match('$', '\n') > > we should start making a list of proposed changes to STNG, > > in order to make STpy and STNG more compatible.. > Well, no, I wouldn't say that. Ho hum. My list of things to do grows by one. :) > > I haven't decided yet on whether I'm happy about having this > > concept of "acceptable ending punctuation.." It sort of seems > > like *all* punctuation should be ok, or *none*.. > > I'm not *too* happy about it myself, and actually it's a string that's > '%' included into the RE texts where it's needed - this means that (a) > it's easy to change, but (b) it should be the same in all places - I > thought consistency was a Good Idea. I definitely agree that, if you do have it, using '%' to splice it in is the Right Thing to do. And that way we can one day try replacing it with an RE for all punctuation, and see how that affects our test cases. :) > > (e.g., should it be ok to have a dash after > > an *emph region*-like this?) > > That looks wrong to me - but then you can see how I use dashes in plain > text! Ok. Bad example. How about saying e-*mail* to put stress on the "mail" part, or *bad*-ass... > There *are* some conventions on how one uses punctuation - for instance, > 'this ,' looks wrong to almost everyone. ST just enforces some > of them (this is, of course, yet another class of things to consider > warning people about). Well, we're not in the business of enforcing punctuation use, so when we can get away with it reasonably, we should let them do whatever they want.. The problem is deciding how it interacts with markup.. -Edward From edloper@gradient.cis.upenn.edu Fri Mar 23 15:27:05 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 23 Mar 2001 10:27:05 EST Subject: [Doc-SIG] ST and DOM In-Reply-To: Your message of "Fri, 23 Mar 2001 14:21:22 GMT." <000d01c0b3a4$89730980$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103231527.f2NFR5p19280@gradient.cis.upenn.edu> > I think that we should agree to agree on a DTD I'll agree, sort of.. One of the PEPs I'm writing has reduced functionality, so its DTD will be a subset of the agreed-upon DTD (in some sense, anyway).. > Is this actually a separate PEP altogether? ("Doc-SIG - the PEP > producer") Hm. I think you're getting a bit PEP-happy. But I'll address that issue later.. > > For now, I want to *only* consider global formatting. We'll get to > > local formatting (=colorising) later. :) > > Reasonable. So we're defining "text blocks" and the structure above > them. Well, almost but not *quite*.. For example, I'd say that the following is one text block:: label: paragraph But it's still got global formatting within it.. > > There are 2 basic types of global formatting element: basic > > elements (which are atomic, as far as global formatting goes); > > and hierarchical elements (which are not). > > OK - that's how I normally think too. But that distinction comes for > free with using a DTD, really. I don't see how it comes free.. You can choose to draw the lines where you want.. (e.g., you were saying that anchors were local formatting). I used the following heuristic to divide things up: * Choose the smallest set of hierarchical elements such that: * paragraph is a basic element. * anything that can contain a basic element or a hierarchical element is a hierarchical element. > Agreed. Some additional elements are needed for callable object > docstrings, though - informally, one also needs the "funcdesc" > (apologies for the poor name) which is made up of a "signature" and an > optional "summary-descripton" - for instance:: > > function(fred[,boolean]) -> integer -- This is silly. > > or > > function(fred[,boolean]) -> boolean > > This is silly. I disagree. Isn't this the whole point of inspect? To get that information? Why include it in the doc string? That just seems to make things very prone to errors. What happens if the signature doesn't match the real signature? etc. > > * labelsection can only appear at top-level > > Needs debating - I don't necessarily disagree, though. I have trouble thinking of what it would mean for labelsections to appear deap within a docstring. > > * anchorsection can only appear at top-level, and after all > > other elements of structuredtext. > > I probably disagree. Probably. I think that if we want anchors to be available anywhere in a docstring, then we need to change them to be local markup, allow them *anywhere* that normal local markup is allowed, and have them be invisible. We would probably also have to change the notation for them. Then, if you want to do an endnote, you just include an anchor at the beginning of the footnote.. Something like:: '[foo]' Foo is a dummy word. Where '' is whatever syntax we decide to use for anchors. I'm not saying this is a *good* thing to do, but I like it better than allowing anchors, as they are currently defined, to appear anywhere. That just seems like a hack. And I don't think the meaning will be obvious to someone reading the plaintext who's not familiar with ST (which it *should* be). > > * list items may not contain sections; but they can contain > > just about anything else (except top-level-only things). > > I *do* agree (I too dislike sections in list items!) The only potential problem I can see is people wanting to use sections in DL items under label sections.. (e.g., when describing a parameter). But I don't think we should let them! :) > Also to be reserved for future consideration: it seems natural to me to > build a DOM tree that represents the whole module or package that is > being dealt with, and "blat it out" in one go to the final format. This > allows one to handle cross-referencing within a package (validate it, > that is), rearrange the tree *as a whole*, and so on. So we will also > want (optional) infrastructure *above* what you have defined. > > I would propose that we have a toplevel node called something like > "document" (heh, its traditional), and appropriate nodes allowed below > that called "module", "function", "class" and "method", with other > appropriate nodes and attributes for storing the useful information one > might want to cache thereon. I think we still need a "structuredtext" element (or something similar), and a distinct "module" element.. the reason being that the "structuredtext" element can contain labeled sections, but a module shouldn't.. Instead, it should contain author sections and version sections etc.. So I think we should have 2 separate *top level* interfaces, which share a bunch of stuff: * the "structuredtext" top-level element is produced when we parse any random ST string, without knowing what it represents. * The docstring top-level elements, like "module" and "function" The first would be produced by a parser; and the second by a docstring tool. I took some of your comments into account, and came up with this revised DTD. The same caveats apply to this one that applied to the last one. :) Basic blocks:: Hierarchical blocks:: Docstrings:: ... Note that the description element does *not* include labelsection elements... I said that ordered list bullets are required.. is that reasonable? Should they be '#IMPLIED' instead? -Edward From tony@lsl.co.uk Fri Mar 23 16:45:05 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 16:45:05 -0000 Subject: [Doc-SIG] ST and DOM In-Reply-To: <200103231527.f2NFR5p19280@gradient.cis.upenn.edu> Message-ID: <001201c0b3b8$9cecc190$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > > Agreed. Some additional elements are needed for callable object > > docstrings, though - informally, one also needs the "funcdesc" > > (apologies for the poor name) which is made up of a > "signature" and an > > optional "summary-descripton" - for instance:: > > > > function(fred[,boolean]) -> integer -- This is silly. > > > > or > > > > function(fred[,boolean]) -> boolean > > > > This is silly. > > I disagree. Isn't this the whole point of inspect? To get that > information? Why include it in the doc string? That just seems > to make things very prone to errors. What happens if the > signature doesn't match the real signature? etc. I believe it is by fiat of the BDFL, in fact - this is a factlet he wishes to have inserted into callable object docstrings. Things like IDLE will use it to produce a pop-up when you type the name of a callable. Technically, it is *not* necessarily the same information you get from the Python code. The first part (signature) declares information you don't get therefrom (the return value) and also uses human readable text for the values, which might arguably be different than their names. The second part is meant to be the "traditional one line summary" of what the callable does. But as I understand it, we are considered unlikely to get a PEP accepted if we don't cater for it. Damn - I can't offhand see a reference to it in my saved Doc-SIG emails, but I'm sure *someone* said words to that effect (someone, are you listening?). > > > * labelsection can only appear at top-level > > > > Needs debating - I don't necessarily disagree, though. > > I have trouble thinking of what it would mean for labelsections > to appear deap within a docstring. I'm not convinced it *does* mean anything, but it still feels like it's not proven yet... > > > * anchorsection can only appear at top-level, and after all > > > other elements of structuredtext. > > > > I probably disagree. Probably. > > I think that if we want anchors to be available anywhere > in a docstring, then we need to change them to be local > markup, allow them *anywhere* that normal local markup is > allowed, and have them be invisible. We would probably > also have to change the notation for them. Then, if you > want to do an endnote, you just include an anchor at > the beginning of the footnote.. Something like:: > > '[foo]' Foo is a dummy word. > > Where '' is whatever syntax we decide to use for anchors. > > I'm not saying this is a *good* thing to do, but I like > it better than allowing anchors, as they are currently defined, > to appear anywhere. That just seems like a hack. And I don't > think the meaning will be obvious to someone reading the > plaintext who's not familiar with ST (which it *should* be). Hmm. Maybe that's a vote for saying we'll deal with anchors as they are now (and make them into anchorsections, as you wish) and defer the other issue - yes, I could go for that, especially as anchors-as-they-are-now is what was discussed and requested (once upon a time) whereas generic anchor points wasn't. > > > * list items may not contain sections; but they can contain > > > just about anything else (except top-level-only things). > > > > I *do* agree (I too dislike sections in list items!) > > The only potential problem I can see is people wanting to > use sections in DL items under label sections.. (e.g., > when describing a parameter). But I don't think we should > let them! :) I agree - that's normally a "presentation" issue, anyway (as in fighting the default presentation of browsers). And if people jump up and down about it too much, we can always change our minds later on. The rest of your email I am resolutely putting aside (i.e., printing) to look at in more detail later on. But this is Good Stuff - I think we're getting somewhere very useful. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From fdrake@acm.org Fri Mar 23 17:43:55 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 23 Mar 2001 12:43:55 -0500 (EST) Subject: [Doc-SIG] Doc/ tree frozen for 2.1b2 release Message-ID: <15035.35675.217841.967860@localhost.localdomain> I'm freezing the doc tree until after the 2.1b2 release is made. Please do not make any further checkins there. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@localhost.localdomain Fri Mar 23 19:11:52 2001 From: fdrake@localhost.localdomain (Fred Drake) Date: Fri, 23 Mar 2001 14:11:52 -0500 (EST) Subject: [Doc-SIG] [development doc updates] Message-ID: <20010323191152.3019628995@localhost.localdomain> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Documentation for the second beta release of Python 2.1. This includes information on future statements and lexical scoping, and weak references. Much of the module documentation has been improved as well. From edloper@gradient.cis.upenn.edu Fri Mar 23 23:14:41 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 23 Mar 2001 18:14:41 EST Subject: [Doc-SIG] docstring signatures Message-ID: <200103232314.f2NNEfp25992@gradient.cis.upenn.edu> Guido, On doc-sig, we're trying to put together some standards/conventions for writing documentation strings, to propose in a PEP. (These conventions could then be used by all manner of docstring-related tools). Tibs said he thought that you wanted to require that such conventions include a "signature" for docstrings of callable objects, such as:: def primes(n): """ primes(n) -> lst -- Return a list of all primes in the range [2,n]. """ ... or:: def primes(n): """ primes(n) -> lst Return a list of all primes in the range [2,n]. """ ... However, it was unclear to me whether that would be affected any by the introduction of tools like inspect.py and pydoc.py's help function. In particular, much of the "signature" information can be obtained by calls to inspect methods; and there is a question of what to do if the "signature" disagrees with inspect. When designing our docstring conventions, should we include signatures, like the one given? Or can we feel free to put information about what is returned by the function, etc., in other places (e.g., under a "Returns: " section)? If you do want us to include signatures, is there somewhere where what they should look like is defined (e.g., whether you should say "primes(n) -> lst" or "primes(int) -> list")? -Edward From edloper@gradient.cis.upenn.edu Fri Mar 23 23:42:57 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 23 Mar 2001 18:42:57 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Fri, 23 Mar 2001 10:34:29 GMT." <002b01c0b384$d7153480$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103232342.f2NNgvp27539@gradient.cis.upenn.edu> > I remember when most tools would show trailing whitespace visibly. Fine by me, as long as we explicitly say that all spaces in text (not in literals) are soft. It seems like the parser *should* reduce sequences of multiple spaces, but I'll live if it doesn't (c.f., XML parsers are required to reduce sequences of multiple spaces in attribute strings like this: ''). > Word wrapping is a presentation issue - if the renderer is generating > etext, or STNG, then it may make sense to *not* word wrap. Yes, but the reader should understand that their text *can* get word-wrapped at (non-literal) spaces. > > (unless, of course, you can give me a good reason why we *should* > > preserve sequences of spaces. > > No. It's only laziness. Ok. Well, I'll be happier if parsers strip that whitespace eventually.. But I won't worry about it for now. :) [Tibs discusses ***] Ok. So, on further thought, *** can be given consistant meaning (assuming a left-to-right-style parsing): CURRENT CONDITION | Meaning Emph? | Strong? | -------+----------+------------------- no | no | start both strong & emph no | yes | end strong, start emph yes | no | end emph, start strong yes | yes | end both strong & emph If you do give '***', that is the meaning it should recieve. Note that '****' shouldn't ever really have a meaning. I guess I'll just have to wait for your nested-coloring regexps. :) (But I still think that '***' is potentially confusing to readers, and that's a Bad Thing). -Edward From Edward Welbourne Fri Mar 23 19:35:06 2001 From: Edward Welbourne (Edward Welbourne) Date: Fri, 23 Mar 2001 19:35:06 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103211835.f2LIZOp09967@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103211835.f2LIZOp09967@gradient.cis.upenn.edu> Message-ID: Me, then Edward: >> I know I'm about to vary my tune but ... someone else has been talking >> persuasively out-of-band. Rather than borrowing the doc directly off >> the parent ... > I think the issue of whether to borrow, or point back, etc., should > be one for the tools. Which may be a good reason for the language > *not* to do anything automatic, like inheriting doc strings. There > are similar questions about whether inherited methods should be listed > in a separate section or not, etc. OK, sounds like more convergence of opinions. And it fits with the benevolent dictat ;^> > But at any rate, we should say that having f.__doc__=None indicates > that inheriting docs is acceptable, and f.__doc__='' means that > inheriting docs is not acceptable. (minor requrest: #f.__doc__ is None# in preference to use of ==) > Of course, all of this will be difficult to do if we're > parsing the file instead of loading it as a module; but that's ok. :) No harder for your parser than for Guido's ;^> Eddy. From Edward Welbourne Fri Mar 23 20:44:49 2001 From: Edward Welbourne (Edward Welbourne) Date: Fri, 23 Mar 2001 20:44:49 +0000 (GMT) Subject: [Doc-SIG] URLs In-Reply-To: <3AB88028.605B88A6@lemburg.com> (mal@lemburg.com) References: <200103202040.f2KKe4p15488@gradient.cis.upenn.edu> <3AB88028.605B88A6@lemburg.com> Message-ID: OK, lots of stuff here and I'm a bit lost so I'm going to think out loud at you, so you have a good chance of spotting where my confusion diverges from what you thought you were saying. If I'm confused, how confused are the lurkers ? It makes sense to provide for a bibliographic definition mechanism for defining short names for use in xrefs in terms of full URLs (ideally with some form of commentary). As I understand it, this is what the > there is a directive of the form: > > ..[ref] url bit is about. How about providing for the url to be followed by arbitrary text to be presented, in the `See also' or biblio section as a description of the relevant xref ? This then makes it possible, as discussed, to use [ref] in the body of paragraphs as a link. This is the `in the style of bibliographic citation' idiom that I gather STNG folk are wedded to, and I see every reason to honour their choice. I don't understand the use of "some text":[ref] with the above reading of [ref], since the citation idiom calls for [ref] to be the text that appears in the output, so aren't you throwing away the "some text", so what's it for. However, I can see a use for "anchor text": as a good way to say, inline, that you want the given anchor text to appear (with its "quotes" stripped) as the text of a link to the given URL. I can see how it might be desirable to use "anchor text":[ref] as a short-hand for the above but requesting [ref]'s URL as the URL, provided ref is the subject of a '..[ref] url' directive. In which case inline [ref] is implicitly a short-hand for "[ref]":[ref] - i.e. use '[ref]' as the text of a link to the url specified in the '..[ref] url' directive. Indeed, I'd be tempted to at least allow the '..[ref] url' directive to enclose url in <...> for the sake of similarity. The only difference from Edward's > Use "name":[ref] for in-line hrefs. If ref is a single token, and > there is a directive of the form: > > ..[ref] url > > Then use url as the URL; otherwise, use ref as the URL. is then that the `otherwise use ref as the URL' fuzziness gets blown away: we get Inline use of "text":[ref] is then a link, with text "text", to the url specified elsewhere by a '..[ref] url optional comments' directive; inline use of simply [ref] is equivalent to "[ref]":[ref] Inline use of "text": is a link, with text "text", to the specified url, without recourse to an '..' directive. anything else vaguely resembling these is just a lump of text with some surprising uses of punctuation. This gets us the asked-for win in terms of letting URLs end in . or appear at the end of a sentence (or both) without ambiguity, while also gaining the asked-for parsability win *and* saving the `if that happens this otherwise the other' gumbo Edward was giving. Furthermore, use of # in a URL will now be within <...>, so we get spared various parser uglies. If I've understood what STNG does (which is a big if, as it's all by inference from what I think you guys are saying), this either removes or simplifies the problem of persuading the STNG folk, since it no longer clashes with the [ref] forms they're used to, and probably makes their lives a lot easier when it comes to parsing the "text":url idioms Tibs lists. And the above is manifestly simpler and more intuitive IMO ;^> > It would indeed make life a lot simpler. ;^) Tibs: > Inline refs were introduced deliberately to look like footnotes aside: [blah] is surely what *bibliographic citations* look like, not *footnotes* in any typesetting idiom I've ever met. But you meant that, I presume. (not sure who): > 3. Local references (which look like '[this]' or '[1]') are now ... > ..[this] ah, so a paragraph starting (or preceded by a line of form ?) '..[this]' is implicitly an accompanied by a:: ..[this] <#something random> directive somewhere in the docstring ? Thus enabling xrefs to that para from within the document using [this] or "anchor text":[this]. And "something random" is putatively "this", I suppose, in which case we've also enabled "anchor text":<#this> Sounds good. > Clarification on the syntax.. is *anything* that looks like [this] a > local reference, or does it have to be preceeded by "a parenthetical > like"[this] or "a parenthetical and a colon like":[this]? erm ... any use of [ref] is either just some text with funny punctuation or using the same name, 'ref', as some particular '..' directive. What problem is there in distinguishing ? Is it the fact that the generated page, in which the <#this> anchor is defined, may be made of several doc strings, so that you don't *know* whether there's a ..[this] in one of the other doc strings making up the page ? If one of the latter, does [this] get rendered with brackets? Flagged as a warning when validating (in principle, not in current implementation)? > If one of the latter, does [this] get rendered with brackets? Flagged > as a warning when validating (in principle, not in current > implementation)? either way, [this] gets rendered with brackets: either it's being made to look like a citation, to the URL specified for '..[this]' to refer to, or it's a lump of random text (about which a doc tool may wish to generate a warning, at least if 'this' matches the label-spec). > What is acceptable content for [this]? '[\w_-]+'? Hmm. Well, ideally we'd support standard citation forms, which would include '[this, that, other]', to be treated like '[this], [that], [other]' but with the excess punctuation ditched (this *is* a standard usage of the citation idiom being mimicked, after all; used when what was said just before it is backed up by three separate texts elsewhere). This can only sensibly be applied to '[refs]' forms, not to '"text":[refs]' forms, for obvious reasons. We'd still be using '[\w_-]+' for the names specified in a '..[ref]' definition, but using '[\w_-]+(, [\w_-]+)*' as the contents of a [...] used as an inline link. But, that aside, and allowing we might insist on the `excess punctuation' being given explicitly (for simplicity/unambiguity), [\w_-]+ sounds like a reasonable deal, albeit I might ditch _ and, in any case, really just ask for the same regex as we use for Labels ... One might plausibly want to allow '&' in ref names (within [...], as opposed to within where, obviously, they're allowed) because of all those papers and books by two authors whose names are the standard way to refer to the book, e.g. ..[K&L] Kaye and Laby, Tables of Physical and Chemical Constants, pub. Longman Scientific and Technical (ignoring the questions of whether the scheme ISBN is implemented yet; pretend the fake ISBN URL were replaced with a suitable URL on Longman's (or some online bookstore's) web site.) But, again, we could demand simplicity and insist on [KandL] without doing anyone any real harm. >> I think we should just go with the English definition of a word, >> which means [-A-Za-z], and leave it at that. It is *meant* to look >> like a word. > Is that too anglo-centric? (modulo inclusion/exclusion of _ which I don't care about) No, it's ASCII-centric and we're really working inside ASCII, so it's appropriate; except that I'd want to include digits, at least for [citations] and I'd argue that we should anticipate folk wanting to use python identifiers here (when, e.g., the relevant python object is defined in some other module and the author doesn't want to rely on vagaries of the doc-tool's relationship with include directives), hence requiring _ and digits; i.e. I agree with Edward's > ... underlines and digits are more applicable for endnotes. > Some people might like this [1] or this [noam_chomsky97]. I'd go for either: * citation names are [\d\w_-]+ read case-sensitively * doc-string labels are [a-z\d-]+ once passed through string.tolower or * both kinds are [\d\w_-]+ read case-insensitively (here using \w_ purely to keep out of arguments about whether \w includes _ already) without noticable preference, and accept that all ST-generic doc-string labels are expected to be Anglic words, hence not to *exercise* the \d allowed in the label spec, but to *allow* \d in labels for the sake of ST-specific dialects which may well want, e.g., to use a number in a label. (By the way - Edward, some of your sentences end .. others end in a single . - why ? i.e. is there a reason other than bouncy fingers ?) Eddy. From edloper@gradient.cis.upenn.edu Sat Mar 24 03:44:03 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 23 Mar 2001 22:44:03 EST Subject: [Doc-SIG] anchors and local references In-Reply-To: Your message of "Fri, 23 Mar 2001 20:44:49 GMT." Message-ID: <200103240344.f2O3i3p09660@gradient.cis.upenn.edu> > OK, lots of stuff here and I'm a bit lost so I'm going to think out loud > at you, so you have a good chance of spotting where my confusion > diverges from what you thought you were saying. If I'm confused, how > confused are the lurkers ? I think part of that confusion comes from me talking without actually knowing what I'm talking about. :) So, I'll go ahead and try to give a brief description of where anchors and local references currently stand (I think), so everyone will be clear on it: /======================================================================\ FUNCTION Anchors and local references are used to add bibliographic references and endnotes to StructuredText. Local references are used to refer to the references/endnotes, and anchors are used to write the references/endnote. (n.b.: we may want to change these terms, because the words themselves suggest something more general). SYNTAX Local references look like [this]. They are normally used either for bibliographic reference, like this: [eddy00]; or for endnotes, like this: [1]. Local references can appear anywhere in the text of a paragraph or list item (and possibly other places, like in a heading). Anchors look like this:: ..[this] This is the anchor for the reference '[this]' The form "..[name]" patterns syntactically almost exactly like the form "name:". In other words, you can do the following:: ..[anchor1] Anchors may span multiple lines. ..[anchor2] Anchors may contain multiple paragraphs. Or even lists. etc. The name of a local reference/anchor should be a single word, but can contain a few punctuation marks (&, _, -, maybe others). The exact contents of a name is yet to be determined, but we can tentitively say it's something like: '[\w&_-]+|[\d]+' Anchors must be the last top-level elements of a StructuredText string. SEMANTICS When a StructuredText string is displayed, a local reference should appear as it does in plaintext. However, it may also be linked in some way to its anchor (e.g., with an href in HTML). For example, in HTML, '[this]' would be rendered as:: [this] When anchors are displayed, their name should be displayed as some type of heading or list bullet, and their contents should be listed under that section or in that list item. For example, it might be sensible to render anchors using DL's in HTML. Also, if local references are linked to anchors, then the anchor should include the target for the link. So:: ..[this] anchor Might be rendered in HTML as::

this
anchor
..[eddy00] email from Edward Welbourne, recieved Fri, 23 Mar 2001 20:44:49 +0000 (GMT). ..[1] It may make sense to say that we should use numbers for endnotes and words for bibliographic entries, but we won't say that for now. \======================================================================/ Note that this is *not* used for out-of-line URI references, which is what I thought it was for at one point. Hmm.. hopefully that helped clarify things a little. I'll have a better explanation once I'm done with my PEP. :) > aside: [blah] is surely what *bibliographic citations* look like, > not *footnotes* in any typesetting idiom I've ever met. But *endnotes* do look like that in some typesetting idioms.. > Hmm. Well, ideally we'd support standard citation forms, which would > include '[this, that, other]', to be treated like '[this], [that], > [other]' Fine with me, if others also want it. Of course, I also wouldn't feel bad about making people type [this], [that], [other]. > One might plausibly want to allow '&' in ref names I agree. > (By the way - Edward, some of your sentences end .. others end in a > single . - why ? i.e. is there a reason other than bouncy fingers ?) No, it's not bouncy fingers.. I'm not sure, exactly.. it's an idiom I only use in emails. I think I use it where I would pause slightly longer if I were speaking. But I'd have to go read my own emails over to figure it out for sure. Sometime's I'll even end my sentences with three periods... :) -Edward.. From edloper@gradient.cis.upenn.edu Sat Mar 24 17:39:00 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sat, 24 Mar 2001 12:39:00 EST Subject: [Doc-SIG] Formalizing ST Message-ID: <200103241739.f2OHd0p13768@gradient.cis.upenn.edu> Hmm.. So I'm starting to think that EBNF really isn't the best formalism for capturing global formatting. It works great for local formatting (=coloring), but it just doesn't do a good job of capturing indentation-related constraints.. So I'm thinking of turning STminus into a two-part formalism: one part to describe global formatting, and one to describe local formatting. EBNF or EBNFla would be used for local formatting, but I'm not sure what to use for global formatting. Any ideas? -Edward From guido@digicool.com Sat Mar 24 19:01:05 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 24 Mar 2001 14:01:05 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Your message of "Fri, 23 Mar 2001 18:14:41 EST." <200103232314.f2NNEfp25992@gradient.cis.upenn.edu> References: <200103232314.f2NNEfp25992@gradient.cis.upenn.edu> Message-ID: <200103241901.OAA27542@cj20424-a.reston1.va.home.com> > Guido, > > On doc-sig, we're trying to put together some standards/conventions > for writing documentation strings, to propose in a PEP. (These > conventions could then be used by all manner of docstring-related > tools). Tibs said he thought that you wanted to require that such > conventions include a "signature" for docstrings of callable objects, > such as:: > > def primes(n): > """ > primes(n) -> lst -- Return a list of all primes in the range [2,n]. > > > """ > ... > > or:: > def primes(n): > """ > primes(n) -> lst > > Return a list of all primes in the range [2,n]. > > > """ > ... Hm, strange. Tibs must have been channeling someone else. I've used this style of docstrings for C functions, where there's no good way to find out the arguments, but not on Python functions and methods. > However, it was unclear to me whether that would be affected any by > the introduction of tools like inspect.py and pydoc.py's help > function. In particular, much of the "signature" information can be > obtained by calls to inspect methods; and there is a question of what > to do if the "signature" disagrees with inspect. > > When designing our docstring conventions, should we include > signatures, like the one given? Or can we feel free to put > information about what is returned by the function, etc., in other > places (e.g., under a "Returns: " section)? Sure. > If you do want us to include signatures, is there somewhere > where what they should look like is defined (e.g., whether > you should say "primes(n) -> lst" or "primes(int) -> list")? > > -Edward PS. Don't spend too much time trying to make StructuredText or some variation thereof work. In my experience with systems that use ST (like ZWiki), it sucks. There basically are two options I like: nicely laid out plain text, or a real markup language like Latex or DocBook. --Guido van Rossum (home page: http://www.python.org/~guido/) From edloper@gradient.cis.upenn.edu Sat Mar 24 19:52:39 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sat, 24 Mar 2001 14:52:39 EST Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Your message of "Sat, 24 Mar 2001 14:01:05 EST." <200103241901.OAA27542@cj20424-a.reston1.va.home.com> Message-ID: <200103241952.f2OJqdp19400@gradient.cis.upenn.edu> > Don't spend too much time trying to make StructuredText or some > variation thereof work. In my experience with systems that use ST > (like ZWiki), it sucks. There basically are two options I like: > nicely laid out plain text, or a real markup language like Latex or > DocBook. > > --Guido van Rossum (home page: http://www.python.org/~guido/) Hm. I guess I should have thought to ask the BDFL about all this before now. :) Makes me wonder if he'll like/accpet *any* of the stuff we've been talking about. But it's interesting to hear that Guido is ok with a real markup language. So are there any vocal opponents of using a real markup language on doc-sig right now? (Assuming that Guido doesn't want us to use something like ST).. Of course, on the other hand, if we can clean ST up enough, and make it formal, maybe he'll be ok with it. I'm going to put my PEP on hold for now, until we figure this stuff out.. (if anyone wants to see what I've written so far, though, I'll be happy to send you a copy -- just email me). I'm also thinking of putting together a "minimal" ST-like language, that would include markup for: * lists * emph * literals (one type, probably using '#' as delimiters) * urls (using '<>' delimiters) * literal blocks But maybe we'd be better off just using XML.. :) or something like javadoc ('@param(x) foo..', etc.).. -Edward From edloper@gradient.cis.upenn.edu Sat Mar 24 20:07:38 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sat, 24 Mar 2001 15:07:38 EST Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Your message of "Sat, 24 Mar 2001 14:01:05 EST." <200103241901.OAA27542@cj20424-a.reston1.va.home.com> Message-ID: <200103242007.f2OK7cp20052@gradient.cis.upenn.edu> > PS. Don't spend too much time trying to make StructuredText or some > variation thereof work. In my experience with systems that use ST > (like ZWiki), it sucks. There basically are two options I like: > nicely laid out plain text, or a real markup language like Latex or > DocBook. Unfortunately, it's a bit late for that (there's been a lot of work this month put into trying to get a ST variant to work). I don't mind not using ST, but before we start working on other formatting conventions, I thought I should get a better idea of what you would or would not like, so we don't end up spending another month on something you won't like. ;) Here's the abstract from the PEP I've been putting together. It should give you a good idea of what we're trying to accomplish, at least: Python documentation strings provide a convenient way of associating documentation with many types of Python objects. However, there are currently no widespread conventions for how the information in documentation strings should be formatted. As a result, it is very difficult to write widely-applicable tools for processing documentation strings. Such tools would be useful for a variety of tasks, such as: * Converting documentation to HTML, LaTeX, or other formats. * Browsing documentation within python. * Ensuring that documentation meets specific requirements. This PEP proposes that the Python community adopt a well-defined set of conventions for writing "formatted documentation strings." These conventions can then be relied upon when writing tools to process formatted documentation strings. Note that some Python programs may choose not to use formatted documentation strings. For example, programs like Zope [1] have used documentation strings for purposes other than strict documentation, and it would be inappropriate to expect them to change how they use documentation strings. Also, some programmers may prefer to write plaintext documentation strings. Also, here are the "design goals" I defined: The following goals guided this PEP's design of conventions for writing formatted documentation strings. * Intuitiveness: The meaning of a well-formed formatted documentation string should be obvious to a reader, even if that reader is not familiar with the formatting conventions. * Ease of use: If the formatting conventions are to be accepted by the Python community, then it must be easy to write formatted documentation strings. * Formality: The formatting conventions must be formally specified. A formal specification allows different tools to interpret formatted documentation strings consistantly, and allows for "competing, interoperable implementations," as specified in PEP 1 [5]. * Expressive Power: The formatting conventions must have enough expressive power to allow users to write the API documentation for any python object. The following "secondary design goals" follow directly from the primary design goals, but are important enough to deserve separate mention: * Simplicity: The formatting conventions should be as simple as is practical, and there should be minimal interaction between different aspects of the formatting conventions. This goal derives from intuitiveness and ease of use. * Safety: No well-formed formatted documentation string should result in unexpected formatting. This goal derives from intuitiveness. So the question then is what sort of markup language we should define. I'd be quite happy to use something like Javadoc uses (but with a more restricted set of acceptable XML elements), but other people think that it's too hard to read/write... I'm also curious why you don't like ST-like markups. We've been putting a fair amount of work into formalizing it & making sure it's "safe" (e.g., it's an error to say *x**b*c**d*). If we can successfully do both, would that alleviate some of your concerns about ST? Any info you can give on what you would like to see come out of this project (or pointers to info) would be most appreciated. -Edward From guido@digicool.com Sat Mar 24 20:37:52 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 24 Mar 2001 15:37:52 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Your message of "Sat, 24 Mar 2001 15:07:38 EST." <200103242007.f2OK7cp20052@gradient.cis.upenn.edu> References: <200103242007.f2OK7cp20052@gradient.cis.upenn.edu> Message-ID: <200103242037.PAA28743@cj20424-a.reston1.va.home.com> > > PS. Don't spend too much time trying to make StructuredText or some > > variation thereof work. In my experience with systems that use ST > > (like ZWiki), it sucks. There basically are two options I like: > > nicely laid out plain text, or a real markup language like Latex or > > DocBook. > > Unfortunately, it's a bit late for that (there's been a lot of work > this month put into trying to get a ST variant to work). I don't > mind not using ST, but before we start working on other formatting > conventions, I thought I should get a better idea of what you would > or would not like, so we don't end up spending another month on > something you won't like. ;) > > Here's the abstract from the PEP I've been putting together. It > should give you a good idea of what we're trying to accomplish, at > least: > > Python documentation strings provide a convenient way of > associating documentation with many types of Python objects. > However, there are currently no widespread conventions for how the > information in documentation strings should be formatted. Not true. Most of the standard library uses the same convention, and even if it's not quite written down, it wouldn't be hard to figure out what it is. Also, my Python Style Guide (http://www.python.org/doc/essays/styleguide.html) has quite a bit of guidance. > As a > result, it is very difficult to write widely-applicable tools for > processing documentation strings. Again not true. Ping's pydoc does quite well second-guessing the existing conventions. > Such tools would be useful for > a variety of tasks, such as: > > * Converting documentation to HTML, LaTeX, or other formats. > * Browsing documentation within python. > * Ensuring that documentation meets specific requirements. > > This PEP proposes that the Python community adopt a well-defined > set of conventions for writing "formatted documentation strings." > These conventions can then be relied upon when writing tools to > process formatted documentation strings. > > Note that some Python programs may choose not to use formatted > documentation strings. For example, programs like Zope [1] have > used documentation strings for purposes other than strict > documentation, and it would be inappropriate to expect them to > change how they use documentation strings. Also, some programmers > may prefer to write plaintext documentation strings. Zope's a red herring (they are trying to get away from this Bobo-ism). Very often we read docstrings as part of the source code, and there plaintext is best, given the state of the art in text editors. > Also, here are the "design goals" I defined: > > The following goals guided this PEP's design of conventions for > writing formatted documentation strings. > > * Intuitiveness: The meaning of a well-formed formatted > documentation string should be obvious to a reader, even if > that reader is not familiar with the formatting conventions. > > * Ease of use: If the formatting conventions are to be > accepted by the Python community, then it must be easy to > write formatted documentation strings. Of course. This is all apple pie and motherhood. nobody will want documentation that's unintuitive or hard to use! > * Formality: The formatting conventions must be formally > specified. A formal specification allows different tools to > interpret formatted documentation strings consistantly, and > allows for "competing, interoperable implementations," as > specified in PEP 1 [5]. Yes, this is important. But when we choose plaintext, we don't need much of a formal specification! > * Expressive Power: The formatting conventions must have > enough expressive power to allow users to write the API > documentation for any python object. I've never found that plaintext got in the way of my expressiveness. > The following "secondary design goals" follow directly from the > primary design goals, but are important enough to deserve separate > mention: > > * Simplicity: The formatting conventions should be as simple > as is practical, and there should be minimal interaction > between different aspects of the formatting conventions. > This goal derives from intuitiveness and ease of use. More motherhood. > * Safety: No well-formed formatted documentation string should > result in unexpected formatting. This goal derives from > intuitiveness. This is a good one. ST loses big here! > So the question then is what sort of markup language we should define. > I'd be quite happy to use something like Javadoc uses (but with a more > restricted set of acceptable XML elements), but other people think > that it's too hard to read/write... I though Javadoc was geared too much towards generating HTML; we should not focus too much on HTML. > I'm also curious why you don't like ST-like markups. We've been > putting a fair amount of work into formalizing it & making sure it's > "safe" (e.g., it's an error to say *x**b*c**d*). If we can > successfully do both, would that alleviate some of your concerns > about ST? The proponents of ST (that I've talked to) seem to believe that it's unnecessary to tell the users what the exact rules are. This, plus numerous bugs in the ST implementation and the context in which it is used, continuously bite me. E.g. if a paragraph starts with a word followed by a period, the word is replaced with "1.". If I use "--" anywhere in the first line of a paragraph it is turned into a
...
... style construct. There's no easy way to escape HTML keywords. In general, when you *don't* want something to have its special effect, there's no way to escape it. There's no way that I know of to create a bulleted item consisting of several paragraphs. The reliance on indentation levels to detect various levels of headings never works for me. > Any info you can give on what you would like to see come out > of this project (or pointers to info) would be most appreciated. A lot of effort has gone into creating a body of documentation for the standard library *outside* the source code. It is rich in mark-up, indexed, contains many examples, well maintained, and is generally considered high quality documentation. I don't want to have to redo the work that went into creating this. It should be easier to combine code and documentation for 3rd party modules, and there should be an easier way to integrate such documentation into (a version of) the standard documentation. But I disagree with the viewpoint that documentation should be maintained in the same file as source code. (If you believe the argument that it is easier to ensure that it stays up to date, think again. This never worked for comments.) --Guido van Rossum (home page: http://www.python.org/~guido/) From Edward Welbourne Sat Mar 24 16:08:21 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 16:08:21 +0000 (GMT) Subject: [Doc-SIG] backslashing In-Reply-To: <001b01c0b2bf$d02b7320$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <001b01c0b2bf$d02b7320$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: > You really should read the doctest documentation (see the chapter in the > 2.1 docs for the best intro) - it *will* test broken examples as well erm. You've missed what I was trying to say. I was considering the case of a piece of code which isn't consistent with what's actually implemented, given in order to explain *why* it's implemented the way it *is* rather than the way someone might think it *should be*; the illustrated code is showing what would happen if things were done the way the alleged *should be* would force them to be done. If the doctest tool can manage to run the illustrative fragment against an imaginary implementation, we can all retire and leave the industry to it - it's AI complete already. > But as you've presented it, that wouldn't naturally be presented as an > interatice session at all - one wouldn't write it as:: > > for example: > > >>> def __repr__(self): > and so on > > but rather as:: > > for example:: > > def __repr__(self): > and so on > > That's *why* the chosen "start of Python paragraph" thing is '>>>' - > because it *is* what it looks like. again, you've missed my point. I was in no way suggesting that my fragment be treated as part of an interactive session; I was, indeed, bemoaning the fact that if I try to supply that fragment, I must supply it wrongly marked up, in one of the two ways illustrated in your response: either as a test case, which is wrong, or as a verbatim block of *alien text*, which is wrong. At least, assuming '::' introduces a verbatim block of alien text. If it introduces a python verbatim block, then I need to know how to insert an alien verbatim block, because I believe I should be able to distinguish the two kinds of verbatim block. Just to be clear: the test-case markup mechanism is a totally different beast from the four varieties of `literal' I was describing, which come in four varieties because there are two independent binary choices: * is the fragment python or alien * do we inline it or is it a block We have #python.code('inlines')#, we have 'alien inlines' but we have only one variety of verbatim block (assuming you'll let me ignore the test-case inline which isn't any of the cases being considered). Eddy. From Edward Welbourne Sat Mar 24 15:49:30 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 15:49:30 +0000 (GMT) Subject: [Doc-SIG] Tokens for labels & endnotes In-Reply-To: <001a01c0b2be$67095890$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <001a01c0b2be$67095890$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: erm ... > For labels I want to exclude '-_', but yes, for labels I want to > include them. the second use of `labels' was meant to be `endnotes' or citations ? Eddy. From Edward Welbourne Sat Mar 24 16:27:46 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 16:27:46 +0000 (GMT) Subject: [Doc-SIG] documenting class attributes In-Reply-To: <001d01c0b2c1$5753dd00$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <001d01c0b2c1$5753dd00$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: Interesting. I have one variety of class where I manage to do this, but it's not generally helpful ... in a base class, Lazy, an idiom is introduced where, for any 'name' not ending in an _ or starting with more than one _, a method called _lazy_get_name_ will be used to supply the value of name (the first time it's asked for: it's then stored in __dict__); this has become my standard way to document attributes (because, of course, I make attributes lazy wherever possible). A few attributes get to be specified in __init__'s docs, because it's saying how it'll initialise them from its inputs. But generally, the only way to document attributes is using a descriptive list in the class docstring ... which isn't > ... **adjacent to the entity documented** *and* user visible. but then I'm a bit skeptical about the line where it gets set being the right place to document it, if only because the attribute may get set in any method, so how do I know where to look for this line ... In some sense, `adjacent to the entity' is meaningless for a python object's attributes, the best you can do is `adjacent to a line of code which happens to set it' or similar. (If not, please illustrate.) The other `solution' I've used (in places) is to have the attribute actually be a sophisticated object carrying around a doc string but managing to pretend to be the value we wanted for the attribute; again, not generally workable. The right place for attribute description is in the typedef, and python doesn't make us do those; which only really leaves the class docstring. What's wrong with the class docstring ? Eddy. From Edward Welbourne Sat Mar 24 16:31:45 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 16:31:45 +0000 (GMT) Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103221347.f2MDlgp18142@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103221347.f2MDlgp18142@gradient.cis.upenn.edu> Message-ID: > I still don't see why x*y>z *has* to go in literals, > > Now, we have a bold "y>z ", and a mysterious '*' after has! Clearly I don't see why. The * doesn't have a magic meaning *unless* it appears at a word boundary with space or punctuator the other side of it. So the * in x*y>z isn't magic. Is it ? Likewise, x * y > z (which is what I'm far more likely to type) and, if the author *does* go typing x *y > z they've only themselves to blame, and the means to fix it is easy. Eddy. From Edward Welbourne Sat Mar 24 16:39:02 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 16:39:02 +0000 (GMT) Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <002301c0b2dd$a708eb30$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <002301c0b2dd$a708eb30$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: >> Are we expecting people to *want* to link into a document from >> outside? I can't see ever having any use for that when writing >> API docs... > > I don't have a use for it, myself, directly. erm. The documentation of class Foo overrides a method of Foo's base, explains the difference in Foo's version's docstring, but needs to refer to the bit of the base's implementation in which is explained the hideous and hairy reason why certain bits have to behave the way they do. The base's implementation's docstring gave that portion a named anchor to which derived classes' docstrings could refer. Eddy. From Edward Welbourne Sat Mar 24 17:15:19 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 17:15:19 +0000 (GMT) Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <000101c0b388$f9778e20$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <000101c0b388$f9778e20$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: > As such, docutils takes the "code it and see how it works" approach > (Python as formalism), whilst you're taking the "think about it hard > and see what it should do" approach (more traditional formalism). heh. The IETF approach and the IEEE/ISO one. Eddy. From Edward Welbourne Sat Mar 24 16:53:55 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 16:53:55 +0000 (GMT) Subject: [Doc-SIG] Re: docutils REs In-Reply-To: <002a01c0b383$3c18a670$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <002a01c0b383$3c18a670$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: Edward, then Tony: >> think on it some more. (e.g., should it be ok to have a dash after >> an *emph region*-like this?) > That looks wrong to me - but then you can see how I use dashes in plain > text! oh, be more imaginative. After an explanation of the difficulties (in some context) of Doing The Right Thing when an empty list is supplied for some parameter (say):: However, for *non*-empty lists, none of the above matters. Eddy. From Edward Welbourne Sat Mar 24 17:02:18 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 17:02:18 +0000 (GMT) Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <002b01c0b384$d7153480$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <002b01c0b384$d7153480$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: > Ah, but there's no reason you shouldn't be able to *say **this***, for > instance (it's quite unambiguous). I'm glad you're both only discussing this hypothetically, then, and both don't want to allow *** at all. If `unambiguous' was all it took, ***this*** would be unambiguous, too - this is emphasised *and* strong. > up being too confusing. I don't think it's unreasonable to require > that people *say **this** *. At the very least, it seems much ditto. And I see no sense in which the space makes it easier to read. and I'd argue that in '*this*', each ' is adjacent to a word-boundary, namely the end of the emphasised word this, and likewise that in ***this*** the outer *...* or **...** abuts the ends of the word obtained by colouring the word using the inner one. Eddy. From Edward Welbourne Sat Mar 24 17:07:21 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 17:07:21 +0000 (GMT) Subject: [Doc-SIG] backslashing In-Reply-To: <002c01c0b385$80d1a8a0$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <002c01c0b385$80d1a8a0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: Edward, then Tony: >> we have to have some way of distinguishing python literal blocks from >> vanilla literal blocks (so we'll have 5 different literalish types: >> literals; inlines; literal blocks; doctest blocks; and python literal >> blocks). > > That way lies madness, 'cos what about C code, oh, and maybe some > Haskell is very important, and... No, only python is special. All other literals are aliens and shall not be distinguished. Just as with 'verbatim alien' and #verbatim.python# It *does* matter to distinguish between verbatim python and verbatim alien text, partly because we might want to (after the fashion of doctest) verify that an alleged python verbatim *does* at least parse and partly because the renderer may well wish to make some identifiers in it be hyperlinks; whereas verbatim alien text should be simply echoed (subject to stripping its indentation down to the level of its context, naturally). Eddy. From Edward Welbourne Sat Mar 24 17:41:01 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 17:41:01 +0000 (GMT) Subject: [Doc-SIG] ST and DOM In-Reply-To: <200103231527.f2NFR5p19280@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103231527.f2NFR5p19280@gradient.cis.upenn.edu> Message-ID: > Docstrings:: > > > > ... needs: in some sense (and I'm assuming function's declaration includes parameter specs, which will look a lot like the attributes section of a class; each will be dlist in one guise or another). > I said that ordered list bullets are required.. is that > reasonable? Should they be '#IMPLIED' instead? If we want to let the renderer decide whether to use numbers, letters, etc. I imagine we'll need #IMPLIED but don't know DTDs well enough to be sure. Eddy. From Edward Welbourne Sat Mar 24 17:46:17 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 17:46:17 +0000 (GMT) Subject: [Doc-SIG] ST and DOM In-Reply-To: <001201c0b3b8$9cecc190$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <001201c0b3b8$9cecc190$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: > Technically, it is *not* necessarily the same information you get from > the Python code. e.g. >>> print range.__doc__ range([start,] stop[, step]) -> list of integers Return a list containing an arithmetic progression of integers. range(i, j) returns [i, i+1, i+2, ..., j-1]; start (!) defaults to 0. When step is given, it specifies the increment (or decrement). For example, range(4) returns [0, 1, 2, 3]. The end point is omitted! These are exactly the valid indices for a list of 4 elements. how, after all, would source code manage to express that ? One can't have an optional first argument, yet require the second: one can only do it by faking it in the way one interprets one's arguments. Eddy. From pf@artcom-gmbh.de Sun Mar 25 09:42:47 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Sun, 25 Mar 2001 11:42:47 +0200 (MEST) Subject: [Doc-SIG] Formalizing ST In-Reply-To: <200103241739.f2OHd0p13768@gradient.cis.upenn.edu> from "Edward D. Loper" at "Mar 24, 2001 12:39: 0 pm" Message-ID: Hi, Edward D. Loper schrieb: > Hmm.. So I'm starting to think that EBNF really isn't the best > formalism for capturing global formatting. Hmmmm..... I think I have to disagree. What is global formatting? Did you ever had a look at the Python/Grammar/Grammer file, which is basically EBNF and uses the special Tokens INDENT and DEDENT? I was thinking of something like this for ST: structured_text: (headed_section | colored_text)* colored_text: (colored_text_line NEWLINE | bullet_list | table | ....) colored_text_line: .... bullet_list: bullet_item bullet_item* bullet_item: ('*'|'-'|'o') colored_text_line [NEWLINE indented_colored_text] indented_colored_text: INDENT colored_text+ DEDENT headed_section: headline section_body headline: colored_text_line NEWLINE ('-'|'=')* NEWLINE This is rather sketchy and not well thought out, but you might get the basic idea. Maybe even 'pgen' from the Python distribution can be reused for parsing formalized ST? For simple implementation experiments John Aycocks spark might be an alternative. Look at the tokenize module from the Standard library. I wish I would have the time to do this myself. Unfortunately I have not. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From mal@lemburg.com Sun Mar 25 13:22:34 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 25 Mar 2001 15:22:34 +0200 Subject: [Doc-SIG] documenting class attributes References: <001d01c0b2c1$5753dd00$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <3ABDF11A.447718C5@lemburg.com> Edward Welbourne wrote: > > Interesting. I have one variety of class where I manage to do this, but > it's not generally helpful ... in a base class, Lazy, an idiom is > introduced where, for any 'name' not ending in an _ or starting with > more than one _, a method called _lazy_get_name_ will be used to supply > the value of name (the first time it's asked for: it's then stored in > __dict__); this has become my standard way to document attributes > (because, of course, I make attributes lazy wherever possible). A few > attributes get to be specified in __init__'s docs, because it's saying > how it'll initialise them from its inputs. > > But generally, the only way to document attributes is using a > descriptive list in the class docstring ... which isn't > > ... **adjacent to the entity documented** *and* user visible. > but then I'm a bit skeptical about the line where it gets set being the > right place to document it, if only because the attribute may get set in > any method, so how do I know where to look for this line ... > > In some sense, `adjacent to the entity' is meaningless for a python > object's attributes, the best you can do is `adjacent to a line of code > which happens to set it' or similar. (If not, please illustrate.) > > The other `solution' I've used (in places) is to have the attribute > actually be a sophisticated object carrying around a doc string but > managing to pretend to be the value we wanted for the attribute; again, > not generally workable. > > The right place for attribute description is in the typedef, and python > doesn't make us do those; which only really leaves the class docstring. > > What's wrong with the class docstring ? It doesn't support class inheritance, that is overriding attributes with new meanings does not work and you also have to chance to build a complete list of all interface attributes. PEP 224 tried to address this problem. The good thing about the solution proposed in PEP 224 is that it doesn't break any working code and uses the same intuitive syntax as class doc-strings themselves. Still, it was rejected, so I'm not trying to get that approach into the core anymore. If anybody has a better idea, please speak up... -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Sun Mar 25 13:35:21 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 25 Mar 2001 15:35:21 +0200 Subject: [Doc-SIG] Re: docstring signatures References: <200103241952.f2OJqdp19400@gradient.cis.upenn.edu> Message-ID: <3ABDF419.79566421@lemburg.com> "Edward D. Loper" wrote: > > > Don't spend too much time trying to make StructuredText or some > > variation thereof work. In my experience with systems that use ST > > (like ZWiki), it sucks. There basically are two options I like: > > nicely laid out plain text, or a real markup language like Latex or > > DocBook. ST may suck, but it still provides a good compromise between readable source code level documentation and a machine parseable format. Just for reference, here's part of a javadoc-string with real markup: /** * Constructs a BigDecimal object from a * BigInteger, with scale 0. *

* Constructs a BigDecimal which is the exact decimal * representation of the BigInteger, with a scale of * zero. * The value of the BigDecimal is identical to the value * of the BigInteger. * The parameter must not be null. *

* The BigDecimal will contain only decimal digits, * prefixed with a leading minus sign (hyphen) if the * BigInteger is negative. A leading zero will be * present only if the BigInteger is zero. * * @param bi The BigInteger to be converted. */ Python doc-string should maintain the same level of elegance as the rest of the language, IMHO. > > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > Hm. I guess I should have thought to ask the BDFL about all this > before now. :) Makes me wonder if he'll like/accpet *any* of the > stuff we've been talking about. But it's interesting to hear > that Guido is ok with a real markup language. > > So are there any vocal opponents of using a real markup language > on doc-sig right now? (Assuming that Guido doesn't want us > to use something like ST).. Here's one ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From Edward Welbourne Sun Mar 25 10:04:13 2001 From: Edward Welbourne (Edward Welbourne) Date: Sun, 25 Mar 2001 11:04:13 +0100 (BST) Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: <200103241952.f2OJqdp19400@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103241952.f2OJqdp19400@gradient.cis.upenn.edu> Message-ID: > So are there any vocal opponents of using a real markup language on > doc-sig right now? (Assuming that Guido doesn't want us to use > something like ST).. Yes (on that assumption). As Guido said: > ... There basically are two options I like: nicely laid out plain > text, or a real markup language like Latex or DocBook. and the first option is blatantly the correct one for doc strings (as in: you won't see enough folk using LaTeX systematically and you won't see *anyone* using DocBook). Oh, I think I'm meant to say `IMO' at about this point. A while ago Guido was reasonably emphatic against HTML (in doc-strings). I suspect I'd written more HTML-based docstrings by then than everyone else put together, and when I turned them into something resembling a proto-ST, I was happier with the result and glad that Guido had rejected HTML. (Sorry, Tibs, I might not have converted all of them ...) > Of course, on the other hand, if we can clean ST up enough, and > make it formal, maybe he'll be ok with it. Yes. While I'm still giggling (largely due to released stress) about Guido's magnificent intervention, it doesn't explicitly rule out `were ST a real markup language it would be OK', only you *really* want to talk to Guido about what `real markup language' means in this context. Clearly he'll allow that indentation-based structure is real structure (since python *is* a real programming language), but equally clearly he's not enamoured of the ST family. This *might* just be because they're all so damnably ad hoc, in which case your clean-up project *might* be a winner; but please have a chat with Guido some how. What *are* his objections to the ST family ? Edward [tweaked] by Eddy in the process of echoing: > I'm also thinking of putting together a "minimal" ST-like language, > that would include markup for: > * lists [three flavours ?] > * emph [presumably no strong] > * urls (using '<>' delimiters) > * [inline] literals (one type, probably using '#' as delimiters) > * literal blocks [presumably one type, again] Some of us would use it if it were * very simple (you're doing fairly well above) * so nearly plain text that just printing it verbatim would work fine but then at least one of us is of debatable sanity ;^> Something in the spirit of ST but done properly would have a better chance than something striving to be ST without its warts, IMO. > But maybe we'd be better off just using XML.. :) IIRC, Guido's reasons against HTML in doc strings will take out XML also. But ask him. > or something like javadoc ('@param(x) foo..', etc.).. OK, ignorance speaks: what's javadoc like, could it be classed as a `real markup language', are there compatibility issues (like it depending on the code it decorates being in a language which uses punctuation, rather than indentation, to delimit structure), where's a good URL for an idiot introduction, why aren't we using it already ? Don't feel that only Edward is allowed to answer those, folks ;^> I suspect he'll say `don't know' to at least the last, and several of you will give better answers than him on the rest. Clearly a markup language specified by someone we can't persuade to change has one humungous advantage: I'd never again spend an entire month's free time fretting about whether some proposed changes were a good idea and how to make them better. If, say, we used javadoc we'd just be stuck with whatever Sun have specified, so even if we don't like some bits of it we'd just knuckle down and get over it. There may be better things for us to channel our energy towards ... especially if no cousin of ST is ever going to find favour with Guido. Eddy. -- Not that I'm grumbling, just knackered, at least largely for other reasons. From edloper@gradient.cis.upenn.edu Sun Mar 25 15:45:40 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 25 Mar 2001 10:45:40 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Sat, 24 Mar 2001 16:31:45 GMT." Message-ID: <200103251545.f2PFjep11824@gradient.cis.upenn.edu> > I don't see why. The * doesn't have a magic meaning *unless* it appears > at a word boundary with space or punctuator the other side of it. So > the * in x*y>z isn't magic. Is it ? Well, that's not quite the environment that Tibs was checking for.. According to the STpy regexps, you *just* need a space to the left/right of the '*'.. so you can say * big * to mean *big*. It might be that we'd want to change that. But there's an argument to be made for having the environments in which emph can start/end be the same as the environments that literal can start/end in. So the question then is whether we want to allow things like ' x '. And actually, thinking about it now, I don't see why we would want to.. So maybe we *should* change to your rules.. something like: \s (?P [*] # open delimiter (?! [\s\n]) # first char can't be sp [^*]* # contents [^*\s\n] # last char can't be sp [*]) # close delimiter Or some cleaner version of that.. (I used the '(?!' so you can have emph regions with only 1 char in them.) Another idea I've been toying with (in my more restricted version of ST) is to *only* allow a *single* word to be emphasized. If you want to emphasize multiple words you have to *do* *it* *like* *this*. That seems much safer/more local/etc.. And I can't think of the last time I tried to emphasize more than 2 words at once anyway. *It just looks weird, and is hard to read, if you try to emphasize a big region.* -Edward From edloper@gradient.cis.upenn.edu Sun Mar 25 15:49:13 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 25 Mar 2001 10:49:13 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Sat, 24 Mar 2001 17:02:18 GMT." Message-ID: <200103251549.f2PFnEp11969@gradient.cis.upenn.edu> > I'm glad you're both only discussing this hypothetically, then, and both > don't want to allow *** at all. Um. *I* don't want to allow '***' at all, but I think Tibs does. -Edward From edloper@gradient.cis.upenn.edu Sun Mar 25 15:55:59 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 25 Mar 2001 10:55:59 EST Subject: [Doc-SIG] ST and DOM In-Reply-To: Your message of "Sat, 24 Mar 2001 17:46:17 GMT." Message-ID: <200103251555.f2PFtxp12310@gradient.cis.upenn.edu> > > Technically, it is *not* necessarily the same information you get from > > the Python code. > > e.g. > > >>> print range.__doc__ > range([start,] stop[, step]) -> list of integers Yeah.. that doc string always bothered me. Makes me think I'm using some language other than Python. :) When I first looked at that signature, I really couldn't tell whether calling it with 2 parameters would give it a stop & a step, or a start & a step... And it seems like you should be able to at *least* tell that type of info from the signature. But, that said, I see your point. -Edward From edloper@gradient.cis.upenn.edu Sun Mar 25 16:02:39 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 25 Mar 2001 11:02:39 EST Subject: [Doc-SIG] Re: Doc-SIG digest, Vol 1 #301 - 11 msgs In-Reply-To: Your message of "Sat, 24 Mar 2001 23:29:02 EST." Message-ID: <200103251602.f2PG2dp12575@gradient.cis.upenn.edu> > >> Are we expecting people to *want* to link into a document from > >> outside? I can't see ever having any use for that when writing > >> API docs... > > > > I don't have a use for it, myself, directly. > > erm. The documentation of class Foo overrides a method of Foo's base, > explains the difference in Foo's version's docstring, but needs to refer > to the bit of the base's implementation in which is explained the > hideous and hairy reason why certain bits have to behave the way they > do. The base's implementation's docstring gave that portion a named > anchor to which derived classes' docstrings could refer. It seems to me that this is asking for broken pointers, etc., within our docs, the next time someone updates the base implementation's docs.. API doc strings really shouldn't be that long anyway, so I don't feel so bad about referring someone to an entire docstring.. -Edward From Edward Welbourne Sun Mar 25 12:57:44 2001 From: Edward Welbourne (Edward Welbourne) Date: Sun, 25 Mar 2001 13:57:44 +0100 (BST) Subject: [Doc-SIG] documenting class attributes In-Reply-To: <3ABDF11A.447718C5@lemburg.com> (mal@lemburg.com) References: <001d01c0b2c1$5753dd00$f05aa8c0@lslp7o.int.lsl.co.uk> <3ABDF11A.447718C5@lemburg.com> Message-ID: >> What's wrong with the class docstring ? > It doesn't support class inheritance, that is overriding attributes > with new meanings does not work and you also have to chance to > build a complete list of all interface attributes. but, if Tony and Edward manage to formalise the structure of the class docstring enough that it has an attributes section, tools can auto-trawl base classes to achieve these desiderata. Each class effectively supplies a mapping from names of attributes it defines to attribute docstrings (extracted from the class' description of the attribute); judicious munging and mangling should then suffice to build up, for each class, a mapping from names of attributes it has (whether it defines them, redefines them or just inherits them) to their docstrings. Job Done. > If anybody has a better idea, please speak up... will the above do ? If not, why not ? I mean, aside from depending on a suitably well-formalised and widely used `Attributes:' section in the class doc-string, and the possibility of the whole ST project being torpedoed by Guido ... Eddy. From tony@lsl.co.uk Mon Mar 26 09:14:21 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 26 Mar 2001 10:14:21 +0100 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103232342.f2NNgvp27539@gradient.cis.upenn.edu> Message-ID: <001f01c0b5d5$247ce400$f05aa8c0@lslp7o.int.lsl.co.uk> Gosh, it's been a busy weekend. Let's assume, for the moment, that this dicussion is still worth pursuing! Edward D. Loper wrote: > Fine by me, as long as we explicitly say that all spaces in text > (not in literals) are soft. It seems like the parser *should* > reduce sequences of multiple spaces, but I'll live if it doesn't > (c.f., XML parsers are required to reduce sequences of multiple > spaces in attribute strings like this: ''). I'm actually happy either way - I think STpy (in draft) currently says that trailing spaces may be lost and that spaces in (not literal) text may be conflated, which leaves it open. I would easily be convinced that those "may"s should be "shall"s... (thinks for 30 seconds) - OK, I shall make it so. Spaces in non-literal text shall be "reduced". They are already "soft". > [Tibs discusses ***] > > Ok. So, on further thought, *** can be given consistant meaning > (assuming a left-to-right-style parsing): > > CURRENT CONDITION | Meaning > Emph? | Strong? | > -------+----------+------------------- > no | no | start both strong & emph > no | yes | end strong, start emph > yes | no | end emph, start strong > yes | yes | end both strong & emph > > If you do give '***', that is the meaning it should recieve. Note > that '****' shouldn't ever really have a meaning. OK. > I guess I'll just have to wait for your nested-coloring regexps. :) > (But I still think that '***' is potentially confusing to readers, > and that's a Bad Thing). This is difficult for me, as I will type it with little-or-no thought (which we all know, of course, is not the same as reading it easily!). I think this is a "debate it later" topic (assuming we *have* a later). (of course, Eddy may be unhappy, since he said: > I'm glad you're both only discussing this hypothetically, then, > and both don't want to allow *** at all. If `unambiguous' was > all it took, ***this*** would be unambiguous, too - this is > emphasised *and* strong. because I can't see what is *wrong* with '***this***' myself - but I note that Edward agrees with Eddy on objecting to it.) Meanwhile, Edward Loper continues to worrit at the "quoted text for emphasis" problem: > Another idea I've been toying with (in my more restricted version > of ST) is to *only* allow a *single* word to be emphasized. If > you want to emphasize multiple words you have to *do* *it* *like* > *this*. That seems much safer/more local/etc.. And I can't > think of the last time I tried to emphasize more than 2 words > at once anyway. *It just looks weird, and is hard to read, if > you try to emphasize a big region.* I agree that it is difficult to "see" a large text emphasised, but I also don't think I would be happy having to emphasis individual words in that style. I believe I *do* emphasise more than one word on occasions, but don't actually *know* (and certainly one word cases *do* predominate). Right - that's that thread, on to the next Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 26 09:19:00 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 26 Mar 2001 10:19:00 +0100 Subject: [Doc-SIG] anchors and local references In-Reply-To: <200103240344.f2O3i3p09660@gradient.cis.upenn.edu> Message-ID: <002001c0b5d5$cb403c60$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper provided a summary with which I, for one, agree (can I steal the text, please?!) and then said, in answer to Eddy's: > > Hmm. Well, ideally we'd support standard citation forms, > which would > > include '[this, that, other]', to be treated like '[this], [that], > > [other]' > > Fine with me, if others also want it. Of course, I also wouldn't > feel bad about making people type [this], [that], [other]. I also agree that the comma seperated form is nice, but for the moment would prefer to leave it alone (just *too many things* to do) (heh - I emphasised more than word, Edward!). > > One might plausibly want to allow '&' in ref names > > I agree. Me too, now it's been pointed out - I'll go with Edward's summary's list of valid characters, I think. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 26 09:24:13 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 26 Mar 2001 10:24:13 +0100 Subject: [Doc-SIG] backslashing In-Reply-To: Message-ID: <002101c0b5d6$859de440$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Welbourne wrote: > Just to be clear: the test-case markup mechanism is a totally > different beast from the four varieties of `literal' I was > describing, which come > in four varieties because there are two independent binary choices: > * is the fragment python or alien > * do we inline it or is it a block > We have #python.code('inlines')#, we have 'alien inlines' but we have > only one variety of verbatim block (assuming you'll let me ignore the > test-case inline which isn't any of the cases being considered). Then, in those terms, yes. Not for any deep philosophical reasons, but just because doctest exists. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From mal@lemburg.com Mon Mar 26 09:26:05 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 26 Mar 2001 11:26:05 +0200 Subject: [Doc-SIG] documenting class attributes References: <001d01c0b2c1$5753dd00$f05aa8c0@lslp7o.int.lsl.co.uk> <3ABDF11A.447718C5@lemburg.com> Message-ID: <3ABF0B2D.3AB8FD33@lemburg.com> Edward Welbourne wrote: > > >> What's wrong with the class docstring ? > > > It doesn't support class inheritance, that is overriding attributes > > with new meanings does not work and you also have to chance to > > build a complete list of all interface attributes. > > but, if Tony and Edward manage to formalise the structure of the class > docstring enough that it has an attributes section, tools can auto-trawl > base classes to achieve these desiderata. Each class effectively > supplies a mapping from names of attributes it defines to attribute > docstrings (extracted from the class' description of the attribute); > judicious munging and mangling should then suffice to build up, for each > class, a mapping from names of attributes it has (whether it defines > them, redefines them or just inherits them) to their docstrings. > Job Done. Some issues: 1. the documentation is separated from the attribute definition -- minor issue, but still important (methods are not documented in the class doc-string either) 2. you'll have to reformat all class doc-strings to make the new feature available (I think common use is to simply add hash comments just before or after the attribute definitions -- these should be easily reusable) 3. we'd need a runtime tool to extract the information from the class doc-string > > If anybody has a better idea, please speak up... > will the above do ? > If not, why not ? > > I mean, aside from depending on a suitably well-formalised and widely > used `Attributes:' section in the class doc-string, and the possibility > of the whole ST project being torpedoed by Guido ... I wasn't under the impression that Guido was trying to torpedo the ST approach. Where did you get that idea ? -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From tony@lsl.co.uk Mon Mar 26 10:14:51 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 26 Mar 2001 11:14:51 +0100 Subject: [Doc-SIG] Re: docstring signatures (answer to Guido) In-Reply-To: <200103242037.PAA28743@cj20424-a.reston1.va.home.com> Message-ID: <002201c0b5dd$985856e0$f05aa8c0@lslp7o.int.lsl.co.uk> Oh dear. Well, thanks to Edward for finally getting the BDFL's opinion on the "top of the callable docstring" thing - wish I could find the reference where it was claimed to be needed. Maybe I was remembering a dream, or living in a parallel world, or something. I'll happily drop them from the spec (and stop using them!). I *am* a bit disturbed, though, as to whether Guido has decided against an ST approach *in all circumstances* (I'm sure he was less negative on previous rounds of the Doc-SIG, but I'm not entirely prepared to trust my memory at this stage). I do know that *I* want some "light formatting" solution, and so have other participants of the SIG - and given this is Python-land, I don't believe that's because we're trying to "nanny" people, I think it's because we want it *for ourselves*. I hope the rest of this email doesn't come across as ranting against Guido. But I *do* feel he's being a little bit unfair... In response to Edward Loper, Guido wrote: > Not true. Most of the standard library uses the same convention, and > even if it's not quite written down, it wouldn't be hard to figure out > what it is. Hmm. But this is where we *came* from, initially, surely - an attempt to figure out what people *actually* write down. Asking people to conform to a convention that is *not* evident explicitly somewhere is, well, a bit unfair (I include me in "people" here, by the way). > > As a result, it is very difficult to write widely-applicable > > tools for processing documentation strings. > > Again not true. Ping's pydoc does quite well second-guessing the > existing conventions. This has been a great source of argument on Doc-SIG in the past - "quite well" is not the goal that some of us wanted. But it's still not the only reason why many of us want ST - we actually want to have some markup in the text for all sorts of reasons. > > * Safety: No well-formed formatted documentation > > string should result in unexpected formatting. > > This is a good one. ST loses big here! I *do* feel that it is a *leetle bit* (excuse the sarcasm) unfair to judge the STpy and STminus works on the basis of a tool/specification that they are not. As far as I can tell, STClassic (the implementation) is *not* a very good example of how to do it (and that's meant to be english understatement). And that seems to be what Guido is basing this statement on. > The proponents of ST (that I've talked to) seem to believe that it's > unnecessary to tell the users what the exact rules are. Yes, but that wasn't *us* - we're proponents of (a form of) ST as well. But obviously just not *those* proponents. > This, plus numerous bugs in the ST implementation and the context > in which it is used, continuously bite me. Again, it's surely a bit unfair to say (as this does) "an implementation of an ancestor specification sucked, so what you're doing does as well". > E.g. if a paragraph starts with a word > followed by a period, the word is replaced with "1.". I agree that's loony. But it's not what is being proposed. > If I use "--" anywhere in the first line of a paragraph > it is turned into a

...
... style construct. Well, ' -- ' in our version - predicated surely on the idea that most people don't use double hyphens in plain text (which I happen to believe as well), whereas the: something -- some text about it style is fairly easy to spot. > There's no easy way to escape HTML keywords. A problem of *that* specification, not of STpy or STminus (and *aggressively* not so). We do *not* weld ourselves to HTML as an output format, nor indeed XML, and thus '<' and '>' are not treated specially at all. > In general, when you *don't* want something to have its > special effect, there's no way to escape it. A problem, agreed - but we've actively been worrying about this, and looking for *specific cases* where this causes a problem, to see if we can work around it. I'd be very interested to know which cases cause Guido problems (and if they're artefacts of the earlier specifications, or something we can use as examples of problems for ourselves). > There's no way that I know of to create a bulleted item > consisting of several paragraphs. This is a lunacy of the implementation Guido's been using, I would say. > The reliance on indentation levels to detect various levels of > headings never works for me. Well, I don't like it either. It won't stop me writing a PEP, though, which does the same thing (and is, of course, pretty close to being written in STpy/STminus). For what it's worth, there *are* proposals to fix that (the section = indentation thingy), but they're not worth pursuing until we have something available to talk about, which is what we've been trying to do. > > Any info you can give on what you would like to see come out > > of this project (or pointers to info) would be most appreciated. > > A lot of effort has gone into creating a body of documentation for the > standard library *outside* the source code. It is rich in mark-up, > indexed, contains many examples, well maintained, and is generally > considered high quality documentation. I don't want to have to redo > the work that went into creating this. Of course not. We're not attempting to change that (at least Edward and I are not). > It should be easier to combine code and documentation for 3rd party > modules, and there should be an easier way to integrate such > documentation into (a version of) the standard documentation. But I > disagree with the viewpoint that documentation should be maintained in > the same file as source code. (If you believe the argument that it > is easier to ensure that it stays up to date, think again. This never > worked for comments.) I'm afraid you (Guido) are conflating two different arguments. The argument by Ka-Ping Yee that the *whole* documentation for a module should live in the file for that module is (a) a different thread, and (b) one I argue strongly against. I hope that Guido isn't already decided on this issue, once and for all. The consensus of the Doc-SIG, over the years (many of whom have been people who Guido knows and has much more reason to respect the opinion of than me) has been that we need a way of formatting docstrings, and that it should be an etext derivative. I believe that we (on the Doc-SIG) have been producing a *better* etext derivative, whilst still trying to stay at least partially compatible with the sibling STNG project. *Should* we have been more radical and just broken with STNG entirely? It would have made life somewhat simpler... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 26 10:14:53 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 26 Mar 2001 11:14:53 +0100 Subject: [Doc-SIG] Re: docstring signatures (small implementation) In-Reply-To: <200103241952.f2OJqdp19400@gradient.cis.upenn.edu> Message-ID: <002301c0b5dd$99a26310$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > So are there any vocal opponents of using a real markup language > on doc-sig right now? (Assuming that Guido doesn't want us > to use something like ST).. I suspect they will have self-deselected. Actually, I don't remember seeing anyone against markup in Doc-SIG - people who want *heavyweight* markup, yes, but not people who want none. > Of course, on the other hand, if we can clean ST up enough, and > make it formal, maybe he'll be ok with it. I hope so - I am a bit worried. I still want it for myself, of course (and so have a lot of other people) - so we may just need to "rally the troops". > I'm going to put my PEP on hold for now, until we figure this stuff > out.. (if anyone wants to see what I've written so far, though, > I'll be happy to send you a copy -- just email me). I'd like a copy, of course (!) > I'm also thinking of putting together a "minimal" ST-like language, > that would include markup for: > * lists > * emph > * literals (one type, probably using '#' as delimiters) > * urls (using '<>' delimiters) > * literal blocks I think that would be a good thing. > But maybe we'd be better off just using XML.. :) or something like > javadoc ('@param(x) foo..', etc.).. We *went round* this loop at least twice before, and it doesn't fly. People won't do it. Eddy says: > Something in the spirit of ST but done properly would have a better > chance than something striving to be ST without its warts, IMO. I must admit I would have been happier in many ways if we could drop some of the inheritance from STClassic (and compatibility intents with STNG). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From gward@mems-exchange.org Mon Mar 26 14:26:26 2001 From: gward@mems-exchange.org (Greg Ward) Date: Mon, 26 Mar 2001 09:26:26 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: <200103241952.f2OJqdp19400@gradient.cis.upenn.edu>; from edloper@gradient.cis.upenn.edu on Sat, Mar 24, 2001 at 02:52:39PM -0500 References: <200103241901.OAA27542@cj20424-a.reston1.va.home.com> <200103241952.f2OJqdp19400@gradient.cis.upenn.edu> Message-ID: <20010326092625.C10678@mems-exchange.org> On 24 March 2001, Edward D. Loper said: > Hm. I guess I should have thought to ask the BDFL about all this > before now. :) Makes me wonder if he'll like/accpet *any* of the > stuff we've been talking about. But it's interesting to hear > that Guido is ok with a real markup language. > > So are there any vocal opponents of using a real markup language > on doc-sig right now? (Assuming that Guido doesn't want us > to use something like ST).. I've been lurking throughout this whole thread (and occasionally leaning on the "D" key... sorry guys... ;-), mainly because it sounds like you're on the right track but you're doing the boring plodwork. Thank you, keep it up, etc. etc. However, I would just like to state for the record that I am not -0, or -1, but more like -1e6 on putting a "real" markup language in docstrings, assuming that the set of "real" markup languages is limited to {Tex-like languages, SGML-like languages}. I consider both to be misbegotten freaks that completely ignore the human factors of writeability and readability. > Of course, on the other hand, if we can clean ST up enough, and > make it formal, maybe he'll be ok with it. > > I'm going to put my PEP on hold for now, until we figure this stuff > out.. (if anyone wants to see what I've written so far, though, > I'll be happy to send you a copy -- just email me). > > I'm also thinking of putting together a "minimal" ST-like language, > that would include markup for: > * lists > * emph > * literals (one type, probably using '#' as delimiters) > * urls (using '<>' delimiters) > * literal blocks Larry Wall has been there and done that: "man perlpod" if you're on a properly administered Unix system. ;-) POD is really easy to write, and pretty easy to read (human) and parse (software). The high-level POD syntax (where /\n\n=[A-Z]+ .*\n\n/ denotes a section delimiter) is closely tied to Perl and irrelevant to Python, since Python already has a way of saying "this text is documentation for module/class/function 'foo'". But the within-paragraph markup convention -- "this is a C, this is B" is pretty easy and useful. Like ST, it could stand a bit of formalization, although that has improved greatly in recent years with Brad Appleton's Pod::Parser family of modules. Althought I've never used ST, my understanding is that ST and POD are pretty semantically similar, and with very similar goals: easy-to-write, easy-to-read, minimal markup that's "good enough" for generating man pages HTML documents, and plain text. They both suffer from lack of formalization, although I think POD is better nowadays. > But maybe we'd be better off just using XML.. :) or something like > javadoc ('@param(x) foo..', etc.).. If I ever see XML in a Python docstring, I think I'll go running back to Perl. ;-) XML is many things to many people, but it most certainly is not fit for human consumption. OTOH, I quite like Javadoc's "@param", "@return" syntax. It's easy to write, easy to read, easy to parse, and just formal enough so that doc tools can make sense of it. It might be more Pythonic to spell those "param:", "returns:", though. ;-) Greg From tony@lsl.co.uk Mon Mar 26 14:26:45 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 26 Mar 2001 15:26:45 +0100 Subject: [Doc-SIG] Re: docstring signatures (and my memory) In-Reply-To: <002201c0b5dd$985856e0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <002701c0b600$c8e303f0$f05aa8c0@lslp7o.int.lsl.co.uk> I wrote: > Oh dear. Well, thanks to Edward for finally getting the BDFL's opinion > on the "top of the callable docstring" thing - wish I could find the > reference where it was claimed to be needed. Unfortunately, it's not trivial to search the doc-sig archives, but a download and a grep later, I find: > Date: Sun, 28 Nov 1999 16:57:03 -0800 (Pacific Standard Time) > From: David Ascher > To: doc-sig@python.org > Subject: [Doc-SIG] docstring grammar > > For compatibility with Guido, IDLE and Pythonwin (and increasing the > likelihood that the proposal will be accepted by GvR), the > docstrings of callables must follow the following convention > established in Python's builtins: > > >>> print len.__doc__ > len(object) -> integer ...rest of explanation omitted... Which was written last time round the Doc-SIG loop. So it wasn't me that was channeling Guido wrongly, after all. (it's very strange going back in Doc-SIG history so deeply - tempting to browse for too long...) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From gward@mems-exchange.org Mon Mar 26 14:39:06 2001 From: gward@mems-exchange.org (Greg Ward) Date: Mon, 26 Mar 2001 09:39:06 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: <200103242037.PAA28743@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Sat, Mar 24, 2001 at 03:37:52PM -0500 References: <200103242007.f2OK7cp20052@gradient.cis.upenn.edu> <200103242037.PAA28743@cj20424-a.reston1.va.home.com> Message-ID: <20010326093905.D10678@mems-exchange.org> On 24 March 2001, Guido van Rossum said: > It should be easier to combine code and documentation for 3rd party > modules, and there should be an easier way to integrate such > documentation into (a version of) the standard documentation. But I > disagree with the viewpoint that documentation should be maintained in > the same file as source code. (If you believe the argument that it > is easier to ensure that it stays up to date, think again. This never > worked for comments.) >From direct personal experience (15 or so Perl modules, some on CPAN and some not, comprising ~10k LoC), I *do* think that intermingling code and documentation makes it easier to update them together. Note that it does not make it *absolutely* easy or painless or automatic; any of those are bogus arguments. But having code and doccs in the same file definitely makes life *less* painful. Greg From tony@lsl.co.uk Mon Mar 26 14:53:17 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 26 Mar 2001 15:53:17 +0100 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: <20010326092625.C10678@mems-exchange.org> Message-ID: <002a01c0b604$7dd26a00$f05aa8c0@lslp7o.int.lsl.co.uk> Greg Ward wrote: > I've been lurking throughout this whole thread (and > occasionally leaning on the "D" key... sorry guys... ;-) no, sounds sensible to me - I hate to think what it must "sound" like to anyone else listening to our, erm, interchanges. > However, I would just like to state for the record that I am > not -0, or -1, but more like -1e6 on putting a "real" markup > language in docstrings, assuming that the set of "real" markup > languages is limited to {Tex-like languages, SGML-like languages}. Well, I think that's taken to be the general meaning of "real" in this sort of context. These days, in this forum, I prefer the term "heavyweight". > I consider both to be misbegotten freaks > that completely ignore the human factors > of writeability and readability. Ah - but they are "misbegotten freaks" that *deliberately* ignore the human factors of etc. TeX because it was originally aimed at people with great motivation to use it for its original purposes (and when there was no alternative), but not *really* for "human beings" in the aggregate. SGML/XML/etc because they're not meant for humans to read/write. Despite the fact some of us do. Personally, I used to be +10 for formal markup, but am now (somewhat reluctantly) -1 against it (see, my response got more moderate!). > Larry Wall has been there and done that: "man perlpod" if you're on a > properly administered Unix system. ;-) Hmm. Actually, our sysadmin really likes perl. And he's a friend. > POD is really easy to write, and > pretty easy to read (human) and parse (software). Hmm. Personally I think it has all the disadvantages for reading that things like XML do, but with none of the advantages of *being* something like XML. Of course, reactions differ. > Althought I've never used ST, my understanding is that ST and > POD are pretty semantically similar, and with very similar goals: > easy-to-write, easy-to-read, minimal markup that's "good enough" for > generating man pages, HTML documents, and plain text. Probably so - they're doubtless as similar as Perl and Python (which is quite similar, of course, compared to many other languages). > They both suffer from lack of > formalization, although I think POD is better nowadays. And if Edward Loper finishes his task (heh, even if I finish mine) will be a lot better for STpy too. > XML is many things to many people, but it most > certainly is not fit for human consumption. It would be amazing if it were - I'm currently working with (well, thinking about, some of the time) *quite large* XML documents (well, actually, they represent geographic data) and one would be surprised if a human ever tried to look at it. > OTOH, I quite like Javadoc's "@param", "@return" syntax. > It's easy to write, easy to read, easy to parse, and just > formal enough so that doc tools can make sense of it. > It might be more Pythonic to spell those "param:", > "returns:", though. ;-) Erm, "Arguments:" and "Returns:" (the last is an "I think", 'cos I don't tend to use it) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From gward@mems-exchange.org Mon Mar 26 15:16:24 2001 From: gward@mems-exchange.org (Greg Ward) Date: Mon, 26 Mar 2001 10:16:24 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: <002a01c0b604$7dd26a00$f05aa8c0@lslp7o.int.lsl.co.uk>; from tony@lsl.co.uk on Mon, Mar 26, 2001 at 03:53:17PM +0100 References: <20010326092625.C10678@mems-exchange.org> <002a01c0b604$7dd26a00$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <20010326101624.B10802@mems-exchange.org> On 26 March 2001, Tony J Ibbs (Tibs) said: > > POD is really easy to write, and > > pretty easy to read (human) and parse (software). > > Hmm. Personally I think it has all the disadvantages for reading that > things like XML do, but with none of the advantages of *being* something > like XML. Of course, reactions differ. The differences are subtle, but they're enough: in POD, tags are shorter (one character), only used intra-paragraph (ie. there aren't tags for sections or indented code chunks -- that's more-or-less implicit), and there's far less tendency to nest them. That makes a *huge* difference for human readability/writeability. > Probably so - they're doubtless as similar as Perl and Python (which is > quite similar, of course, compared to many other languages). Yup. In my fairly uninformed opinion, it seems like the main differene between POD and ST is spelling. One uses B and C, the other uses *bold* and 'code'. I prefer POD's slightly more in-your-face and less ambiguous markup, but that's mainly because I have experience with it and I know I like it. I'm sure I could come to like ST in time, too. ;-) > > OTOH, I quite like Javadoc's "@param", "@return" syntax. > > It's easy to write, easy to read, easy to parse, and just > > formal enough so that doc tools can make sense of it. > > It might be more Pythonic to spell those "param:", > > "returns:", though. ;-) > > Erm, "Arguments:" and "Returns:" (the last is an "I think", 'cos I don't > tend to use it) More trivial spelling differences. I don't much care how it's spelled, but I like the idea of a tiny bit of formal markup to say, "this is a function return value", "this is a function argument", "this is an instance attribute", "this is a class attribute", etc. Trailing colon is definitely more Pythonic than leading "@", though! Greg From guido@digicool.com Mon Mar 26 15:31:02 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 26 Mar 2001 10:31:02 -0500 Subject: [Doc-SIG] Re: docstring signatures (answer to Guido) In-Reply-To: Your message of "Mon, 26 Mar 2001 11:14:51 +0100." <002201c0b5dd$985856e0$f05aa8c0@lslp7o.int.lsl.co.uk> References: <002201c0b5dd$985856e0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103261531.KAA06371@cj20424-a.reston1.va.home.com> > Oh dear. Well, thanks to Edward for finally getting the BDFL's opinion > on the "top of the callable docstring" thing - wish I could find the > reference where it was claimed to be needed. Maybe I was remembering a > dream, or living in a parallel world, or something. I'll happily drop > them from the spec (and stop using them!). You probably saw the docstrings on some *extension* modules, where the signature is generally included. > I *am* a bit disturbed, though, as to whether Guido has decided against > an ST approach *in all circumstances* (I'm sure he was less negative on > previous rounds of the Doc-SIG, but I'm not entirely prepared to trust > my memory at this stage). That's because now I have actual experience using ST (in ZWiki). > I do know that *I* want some "light formatting" solution, and so have > other participants of the SIG - and given this is Python-land, I don't > believe that's because we're trying to "nanny" people, I think it's > because we want it *for ourselves*. > > I hope the rest of this email doesn't come across as ranting against > Guido. But I *do* feel he's being a little bit unfair... > > In response to Edward Loper, Guido wrote: > > Not true. Most of the standard library uses the same convention, and > > even if it's not quite written down, it wouldn't be hard to figure out > > what it is. > > Hmm. But this is where we *came* from, initially, surely - an attempt to > figure out what people *actually* write down. Asking people to conform > to a convention that is *not* evident explicitly somewhere is, well, a > bit unfair (I include me in "people" here, by the way). You cut out the part where I pointed out that it *is* explicit -- in the style guidelines, which haven't been challenged. > > > As a result, it is very difficult to write widely-applicable > > > tools for processing documentation strings. > > > > Again not true. Ping's pydoc does quite well second-guessing the > > existing conventions. > > This has been a great source of argument on Doc-SIG in the past - "quite > well" is not the goal that some of us wanted. But it's still not the > only reason why many of us want ST - we actually want to have > some markup in the text for all sorts of reasons. There are only two choices. Either you have markup or you don't. If you design a markup system, it should be complete, and allow full control over the lay-out -- including full control in cases where you *don't* want the special characters to have effect. ST is neither markup nor "not markup", and that's why it fails, in my view. > > > * Safety: No well-formed formatted documentation > > > string should result in unexpected formatting. > > > > This is a good one. ST loses big here! > > I *do* feel that it is a *leetle bit* (excuse the sarcasm) unfair to > judge the STpy and STminus works on the basis of a tool/specification > that they are not. As far as I can tell, STClassic (the implementation) > is *not* a very good example of how to do it (and that's meant to be > english understatement). And that seems to be what Guido is basing this > statement on. Sure. As I'm not a subscriber to this list, I was not aware of those, and nobody has bothered to forward me a pointer to a specification. (I typed them into Google but got no useful hits.) > > The proponents of ST (that I've talked to) seem to believe that it's > > unnecessary to tell the users what the exact rules are. > > Yes, but that wasn't *us* - we're proponents of (a form of) ST as well. > But obviously just not *those* proponents. Then you did a poor job of distinguishing yourself. Thanks for clarifying. > > This, plus numerous bugs in the ST implementation and the context > > in which it is used, continuously bite me. > > Again, it's surely a bit unfair to say (as this does) "an implementation > of an ancestor specification sucked, so what you're doing does as well". Well, you have associated yourself with it by choosing the same moniker. I see that you are trying to dissociate yourself now. OK, I'll give it a shot. But show me the specs please! > > E.g. if a paragraph starts with a word > > followed by a period, the word is replaced with "1.". > > I agree that's loony. But it's not what is being proposed. > > > If I use "--" anywhere in the first line of a paragraph > > it is turned into a
...
... style construct. > > Well, ' -- ' in our version - predicated surely on the idea that most > people don't use double hyphens in plain text (which I happen to believe > as well), whereas the: > > something -- some text about it > > style is fairly easy to spot. Then we disagree -- I use double hyphens in text *all the time* -- and I know I'm not alone. Unless I misunderstand what you propose. > > There's no easy way to escape HTML keywords. > > A problem of *that* specification, not of STpy or STminus (and > *aggressively* not so). We do *not* weld ourselves to HTML as an output > format, nor indeed XML, and thus '<' and '>' are not treated specially > at all. > > > In general, when you *don't* want something to have its > > special effect, there's no way to escape it. > > A problem, agreed - but we've actively been worrying about this, and > looking for *specific cases* where this causes a problem, to see if we > can work around it. I'd be very interested to know which cases cause > Guido problems (and if they're artefacts of the earlier specifications, > or something we can use as examples of problems for ourselves). Without knowing your ruleset I can't know what the problems are, of course. > > There's no way that I know of to create a bulleted item > > consisting of several paragraphs. > > This is a lunacy of the implementation Guido's been using, I would say. I would hope so. > > The reliance on indentation levels to detect various levels of > > headings never works for me. > > Well, I don't like it either. It won't stop me writing a PEP, though, > which does the same thing (and is, of course, pretty close to being > written in STpy/STminus). Ah, but the "pretty close" is exactly what's wrong. PEPs are written in plain text, and there's not enough information to know when to interpret characters as markup and when not to. E.g. a PEP describing ST would be filled with examples of ST markup -- if the PEP is written in plaintext, these examples don't need any special quoting, but if it is written in ST, they must be quoted. (And don't tell me to put all examples in literal blocks -- inline examples are essential.) The PEP-to-HTML processor uses only one strict rule, and a few heuristics: it uses unindented text for headings (but it doesn't have multiple heading levels), and it turns things looking like URLs into hyperlinks. But other than that it doesn't use any markup characters, and even line breaks in the original text are honored exactly in the HTML. > For what it's worth, there *are* proposals to fix that (the section = > indentation thingy), but they're not worth pursuing until we have > something available to talk about, which is what we've been trying to > do. Sorry, you've lost me here. > > > Any info you can give on what you would like to see come out > > > of this project (or pointers to info) would be most appreciated. > > > > A lot of effort has gone into creating a body of documentation for the > > standard library *outside* the source code. It is rich in mark-up, > > indexed, contains many examples, well maintained, and is generally > > considered high quality documentation. I don't want to have to redo > > the work that went into creating this. > > Of course not. We're not attempting to change that (at least Edward and > I are not). OK. The factions in the doc-sig are hard to keep apart for an outsider. At the conference I met some people who wanted to prescribe that all module documentation be maintained in the source code file. I find that insanity. > > It should be easier to combine code and documentation for 3rd party > > modules, and there should be an easier way to integrate such > > documentation into (a version of) the standard documentation. But I > > disagree with the viewpoint that documentation should be maintained in > > the same file as source code. (If you believe the argument that it > > is easier to ensure that it stays up to date, think again. This never > > worked for comments.) > > I'm afraid you (Guido) are conflating two different arguments. The > argument by Ka-Ping Yee that the *whole* documentation for a module > should live in the file for that module is (a) a different thread, and > (b) one I argue strongly against. I'm so glad to hear that. See above. :-) > I hope that Guido isn't already decided on this issue, once and for all. No, I'm always open to reasonable input. I'm waiting for your spec so I can form an opinion on it. > The consensus of the Doc-SIG, over the years (many of whom have been > people who Guido knows and has much more reason to respect the opinion > of than me) has been that we need a way of formatting docstrings, and > that it should be an etext derivative. (But I've never agreed so far, and my one prolonged experience with an ST-based system makes me hate every bit of it. It could be that implementation though.) > I believe that we (on the > Doc-SIG) have been producing a *better* etext derivative, whilst still > trying to stay at least partially compatible with the sibling STNG > project. I don't think that compatibility with a possibly broken alternative ought to be constraining you. > *Should* we have been more radical and just broken with STNG entirely? > It would have made life somewhat simpler... I don't know much about STNG either -- I always thought that it was an idea for a project to fix the problems with ST, but not anything concrete. I am not aware of any part of Zope actually using STNG, so I doubt that interoperability can be a big issue. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Mar 26 15:36:42 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 26 Mar 2001 10:36:42 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Your message of "Mon, 26 Mar 2001 09:39:06 EST." <20010326093905.D10678@mems-exchange.org> References: <200103242007.f2OK7cp20052@gradient.cis.upenn.edu> <200103242037.PAA28743@cj20424-a.reston1.va.home.com> <20010326093905.D10678@mems-exchange.org> Message-ID: <200103261536.KAA06434@cj20424-a.reston1.va.home.com> > From direct personal experience (15 or so Perl modules, some on CPAN and > some not, comprising ~10k LoC), I *do* think that intermingling code and > documentation makes it easier to update them together. > > Note that it does not make it *absolutely* easy or painless or automatic; > any of those are bogus arguments. But having code and doccs in the same > file definitely makes life *less* painful. What argues against this for me is the existence of highly tuned language-specific editing modes in Emacs and many other text editors; these rarely do a good job on hybrids. --Guido van Rossum (home page: http://www.python.org/~guido/) From gward@mems-exchange.org Mon Mar 26 15:42:37 2001 From: gward@mems-exchange.org (Greg Ward) Date: Mon, 26 Mar 2001 10:42:37 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: <200103261536.KAA06434@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Mar 26, 2001 at 10:36:42AM -0500 References: <200103242007.f2OK7cp20052@gradient.cis.upenn.edu> <200103242037.PAA28743@cj20424-a.reston1.va.home.com> <20010326093905.D10678@mems-exchange.org> <200103261536.KAA06434@cj20424-a.reston1.va.home.com> Message-ID: <20010326104236.D10854@mems-exchange.org> On 26 March 2001, Guido van Rossum said: > > From direct personal experience (15 or so Perl modules, some on CPAN and > > some not, comprising ~10k LoC), I *do* think that intermingling code and > > documentation makes it easier to update them together. > > > > Note that it does not make it *absolutely* easy or painless or automatic; > > any of those are bogus arguments. But having code and doccs in the same > > file definitely makes life *less* painful. > > What argues against this for me is the existence of highly tuned > language-specific editing modes in Emacs and many other text editors; > these rarely do a good job on hybrids. Absolutely true. But python-mode already does a poor job of handling docstrings -- or in fact any interesting[1] triple-quoted string. Personally, I wouldn't mind my docstrings being entirely one colour (say, the "string" colour) and my Python code being colourized properly -- but even without any formal markup in docstrings, Emacs can't handle that, so how can adding a markup syntax make things worse? Greg [1] Eg. def foo (bar, baz): """Returns 'bar' if 'baz' is "foo", or 'baz' if 'bar' is "foo".""" This is legitimate plain-text, and probably pretty close to legit ST, but stuff like this throws python-mode for a loop. I do this kind of markup all the time myself. (Although not that kind of semantics, thankfully. ;-) From guido@digicool.com Mon Mar 26 16:08:01 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 26 Mar 2001 11:08:01 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Your message of "Mon, 26 Mar 2001 10:42:37 EST." <20010326104236.D10854@mems-exchange.org> References: <200103242007.f2OK7cp20052@gradient.cis.upenn.edu> <200103242037.PAA28743@cj20424-a.reston1.va.home.com> <20010326093905.D10678@mems-exchange.org> <200103261536.KAA06434@cj20424-a.reston1.va.home.com> <20010326104236.D10854@mems-exchange.org> Message-ID: <200103261608.LAA06633@cj20424-a.reston1.va.home.com> > > > Note that it does not make it *absolutely* easy or painless or automatic; > > > any of those are bogus arguments. But having code and doccs in the same > > > file definitely makes life *less* painful. > > > > What argues against this for me is the existence of highly tuned > > language-specific editing modes in Emacs and many other text editors; > > these rarely do a good job on hybrids. > > Absolutely true. But python-mode already does a poor job of handling > docstrings -- or in fact any interesting[1] triple-quoted string. Personally, > I wouldn't mind my docstrings being entirely one colour (say, the "string" > colour) and my Python code being colourized properly -- but even without any > formal markup in docstrings, Emacs can't handle that, so how can adding a > markup syntax make things worse? (IDLE does docstrings right, by the way.) Emacs also has sophisticated Latex and XML modes, which are however useless for Latex or XML embedded in docstrings. That's why I prefer to have the docs in a separate file -- so I can have a separate mode to help me edit it. --Guido van Rossum (home page: http://www.python.org/~guido/) From edloper@gradient.cis.upenn.edu Mon Mar 26 16:18:53 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 26 Mar 2001 11:18:53 EST Subject: [Doc-SIG] Where to go from here Message-ID: <200103261618.f2QGIrp12424@gradient.cis.upenn.edu> Guido's input has raised some questions about whether we're going in the right direction... But he's made it clear that he'll keep an open mind, and seriously consider any real specs we come up with. So I propose we do the following: - continue trying to come up with a concrete, formal spec - drop the idea of maintaining compatibility with STNG, for now. Once STNG sees how much cooler our markup language is, we can convert them. ;) Maybe we should even come up with a new name, so that other people who have become embittered with STclassic won't take it out on us. :) - STminus will focus purely on coming up with a formal description, and drop its goals of unifying STNG/STpy. - Focus on the goal of making a *real* markup language that is *lightweight* and simple to read/write. - Once we have a real specification (hopefully in a couple weeks), we can talk to Guido/others about whether it's acceptable. It's unreasonable to expect Guido to make judgements when the ST stuff is in the state of flux it's in now. Dropping STNG compatibility will allow us to consider a number of options that I hadn't brought up before.. For example, I think we might want to replace '--' with '---' as the description list indicator, since people *do* use '--' in text (I know I do, and apparently Guido does too). And I think we should drop 'o' as a bullet character. etc.. As for colorization, java-mode does just fine colorizing javadoc comments, so I don't see how it's a problem *in principle*, just a problem of someone figuring out what to tell emacs (I'm sure emacs could be told to colorize tripple-quoted-strings correctly if someone really wanted to figure out how to.. I've just been using the work-around of backslashing all double quotes in tripple-quoted-strings, which doesn't affect their value, and makes them colorize correctly) -Edward From pf@artcom-gmbh.de Mon Mar 26 17:22:45 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Mon, 26 Mar 2001 19:22:45 +0200 (MEST) Subject: [Doc-SIG] Where to go from here In-Reply-To: <200103261618.f2QGIrp12424@gradient.cis.upenn.edu> from "Edward D. Loper" at "Mar 26, 2001 11:18:53 am" Message-ID: Hi Edward, > - continue trying to come up with a concrete, formal spec That would be very nice. > - drop the idea of maintaining compatibility with STNG, for Yes. *Some* ideas from ST are good. Let's drop all the others. Especially heading recognition in ST sucks. > Dropping STNG compatibility will allow us to consider a number > of options that I hadn't brought up before.. For example, I think > we might want to replace '--' with '---' as the description list > indicator, since people *do* use '--' in text (I know I do, and > apparently Guido does too). And I think we should drop 'o' as > a bullet character. etc.. I think, a description list can be dropped alltogether. At least for the time being a bullet list will be enough. enumerated lists: ...hmmm... I think we can also live without them for a try. I think we should aim for *very* minimalistic set of features and people may than add other things lateron: * emphasizing of *single* words. * section headings (marked up through underlining with a line of hyphens or '=' and preceeded by a blank line). * bullet item lists (which may be nested through indentation). * References to URLs, to Mailaddresses and to Python objects. * pre formatted paragraphs for code examples, tables and such: (every paragraph with mixed indentation or which starts with the patterns '>>>' or '+--' should be left allone. Only properly aligned normal text paragraphs should allowed for reformatting. Than let's try to implement this minimal set and plug this into Ping's pydoc and see what comes out, if running this on existing sources. Of course this will never be able to replace an external documentation written with a powerful markup system like LaTeX. But it would make Pings marvelous pydoc an even more worthwile tool for all this useful version 0.x.y stuff written in Python, which comes without documentation for the prime time. Just my 2 pfennig, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From guido@digicool.com Mon Mar 26 06:34:09 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 26 Mar 2001 01:34:09 -0500 Subject: [Doc-SIG] Where to go from here In-Reply-To: Your message of "Mon, 26 Mar 2001 19:22:45 +0200." References: Message-ID: <200103260634.BAA00807@cj20424-a.reston1.va.home.com> > > Dropping STNG compatibility will allow us to consider a number > > of options that I hadn't brought up before.. For example, I think > > we might want to replace '--' with '---' as the description list > > indicator, since people *do* use '--' in text (I know I do, and > > apparently Guido does too). And I think we should drop 'o' as > > a bullet character. etc.. > > I think, a description list can be dropped alltogether. Yes! They are darn ugly in HTML anyway. > At least for the time being a bullet list will be enough. Agreed. > enumerated lists: ...hmmm... I think we can also live without > them for a try. In any case, ST++ shouldn't go and rewrite the item numbers. The requirement that the input is also readable without processing means that the author ought to put the proper numbers in there by hand anyway, so all ST++ needs to do is recognize them and give the paragraph the proper indent/spacing. > I think we should aim for *very* minimalistic set of features > and people may than add other things lateron: > * emphasizing of *single* words. > * section headings (marked up through underlining with a line of > hyphens or '=' and preceeded by a blank line). > * bullet item lists (which may be nested through indentation). > * References to URLs, to Mailaddresses and to Python objects. > * pre formatted paragraphs for code examples, tables and such: > (every paragraph with mixed indentation or which starts with > the patterns '>>>' or '+--' should be left allone. Only properly > aligned normal text paragraphs should allowed for reformatting. Please do look at the conventions in MoinMoin an another example! > Than let's try to implement this minimal set and plug this into Ping's > pydoc and see what comes out, if running this on existing sources. > Of course this will never be able to replace an external documentation > written with a powerful markup system like LaTeX. But it would > make Pings marvelous pydoc an even more worthwile tool for all this > useful version 0.x.y stuff written in Python, which comes without > documentation for the prime time. > > Just my 2 pfennig, Peter Surely you mean 0.02 Euro. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From ping@lfw.org Mon Mar 26 12:39:23 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 26 Mar 2001 04:39:23 -0800 (PST) Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: <20010326104236.D10854@mems-exchange.org> Message-ID: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. --0-752675476-985610362=:570 Content-Type: TEXT/PLAIN; charset=US-ASCII On Mon, 26 Mar 2001, Greg Ward wrote: > I wouldn't mind my docstrings being entirely one colour (say, the "string" > colour) and my Python code being colourized properly -- but even without any > formal markup in docstrings, Emacs can't handle that, so how can adding a > markup syntax make things worse? [...] > def foo (bar, baz): > """Returns 'bar' if 'baz' is "foo", or 'baz' if 'bar' is "foo".""" > > This is legitimate plain-text, and probably pretty close to legit ST, but > stuff like this throws python-mode for a loop. This is pretty surprising. I use Vim and it has never had any trouble with colourizing this kind of stuff. It was even very easy to classify docstrings (based on their position in the code) separately from ordinary literal strings. Python's syntax makes this easy; you just look for a colon at the end of the previous line. Surely it must be possible for Emacs to do the same, since elisp is so much more powerful than the pattern language Vim uses for configuring colourization modes -- it's annoying to have Python code littered with all of these font-lock hints. For reference i've attached my Python syntax-highlighting file for Vim. It has served me quite well over the years. -- ?!ng --0-752675476-985610362=:570 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="python.vim" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename="python.vim" IiBWaW0gc3ludGF4IGZpbGUgZm9yIFB5dGhvbg0KIiBLYS1QaW5nIFllZSwg MTMgSmFudWFyeSAxOTk5DQoNCnN5bnRheCBjbGVhcg0KDQpzeW50YXgga2V5 d29yZCBweXRob25TdGF0ZW1lbnQgICAgICAgICAgYnJlYWsgY29udGludWUg ZGVsDQpzeW50YXgga2V5d29yZCBweXRob25TdGF0ZW1lbnQgICAgICAgICAg ZXhjZXB0IGV4ZWMgZmluYWxseQ0Kc3ludGF4IGtleXdvcmQgcHl0aG9uU3Rh dGVtZW50ICAgICAgICAgIHBhc3MgcHJpbnQgcmFpc2UNCnN5bnRheCBrZXl3 b3JkIHB5dGhvblN0YXRlbWVudCAgICAgICAgICByZXR1cm4gdHJ5DQpzeW50 YXgga2V5d29yZCBweXRob25SZXBlYXQgICAgICAgICAgICAgZm9yIHdoaWxl DQpzeW50YXgga2V5d29yZCBweXRob25Db25kaXRpb25hbCAgICAgICAgaWYg ZWxpZiBlbHNlIHRoZW4NCnN5bnRheCBrZXl3b3JkIHB5dGhvbk9wZXJhdG9y ICAgICAgICAgICBhbmQgaW4gaXMgbm90IG9yDQpzeW50YXgga2V5d29yZCBw eXRob25Ub2RvICAgICAgICAgICAgICAgVE9ETyBGSVhNRSBYWFggY29udGFp bmVkDQoNCnN5bnRheCBtYXRjaCAgIHB5dGhvbkRvY1N0YXJ0ICAgICAgICAg ICAiOiAqJCIgbmV4dGdyb3VwPXB5dGhvbkRvY1N0cmluZyBza2lwbmwgc2tp cHdoaXRlDQpzeW50YXggcmVnaW9uICBweXRob25Eb2NTdHJpbmcgICAgICBz dGFydD0vclw9Jy8gIGVuZD0vJy8gc2tpcD0vXFwnXHxcXFxcLw0Kc3ludGF4 IHJlZ2lvbiAgcHl0aG9uRG9jU3RyaW5nICAgICAgc3RhcnQ9L3JcPSIvICBl bmQ9LyIvIHNraXA9L1xcIlx8XFxcXC8NCnN5bnRheCByZWdpb24gIHB5dGhv bkRvY1N0cmluZyAgICAgIHN0YXJ0PS9yXD0iIiIvICBlbmQ9LyIiIi8gc2tp cD0vXFwiXHxcXFxcLw0Kc3ludGF4IHJlZ2lvbiAgcHl0aG9uRG9jU3RyaW5n ICAgICAgc3RhcnQ9L3JcPScnJy8gIGVuZD0vJycnLyBza2lwPS9cXCdcfFxc XFwvDQoNCnN5bnRheCByZWdpb24gIHB5dGhvblN0cmluZyAgICAgICAgIHN0 YXJ0PS9yXD0nLyAgZW5kPS8nLyBza2lwPS9cXCdcfFxcXFwvDQpzeW50YXgg cmVnaW9uICBweXRob25TdHJpbmcgICAgICAgICBzdGFydD0vclw9Ii8gIGVu ZD0vIi8gc2tpcD0vXFwiXHxcXFxcLw0Kc3ludGF4IHJlZ2lvbiAgcHl0aG9u U3RyaW5nICAgICAgICAgc3RhcnQ9L3JcPSIiIi8gIGVuZD0vIiIiLyBza2lw PS9cXCJcfFxcXFwvDQpzeW50YXggcmVnaW9uICBweXRob25TdHJpbmcgICAg ICAgICBzdGFydD0vclw9JycnLyAgZW5kPS8nJycvIHNraXA9L1xcJ1x8XFxc XC8NCg0Kc3ludGF4IG1hdGNoIHB5dGhvbkNhbGwgICAgICAgICAgICAgICAg ICJbQS1aYS16MC05X11cKygibWU9ZS0xDQoNCnN5bnRheCBrZXl3b3JkIHB5 dGhvblR5cGUgICAgICAgICAgICAgICBsYW1iZGENCnN5bnRheCBrZXl3b3Jk IHB5dGhvbkRlZmluaXRpb24gICAgICAgICBkZWYgY2xhc3MgY29udGFpbmVk DQpzeW50YXgga2V5d29yZCBweXRob25QcmVQcm9jICAgICAgICAgICAgaW1w b3J0IGZyb20NCnN5bnRheCBtYXRjaCAgIHB5dGhvbkNsYXNzRGVmICAgICAg ICAgICAiXDxjbGFzc1xzXCtbQS1aYS16MC05X11cKyIgY29udGFpbnM9cHl0 aG9uRGVmaW5pdGlvbixweXRob25DbGFzcw0Kc3ludGF4IG1hdGNoICAgcHl0 aG9uQ2xhc3MgICAgICAgICAgICAgICJbQS1aYS16MC05X11cKyIgY29udGFp bmVkDQpzeW50YXggbWF0Y2ggICBweXRob25GdW5jdGlvbkRlZiAgICAgICAg Ilw8ZGVmXHNcK1tBLVphLXowLTlfXVwrIiBjb250YWlucz1weXRob25EZWZp bml0aW9uLHB5dGhvbkZ1bmN0aW9uDQpzeW50YXggbWF0Y2ggICBweXRob25G dW5jdGlvbiAgICAgICAgICAgIltBLVphLXowLTlfXVwrIiBjb250YWluZWQN CnN5bnRheCBtYXRjaCAgIHB5dGhvbkNvbW1lbnQgICAgICAgICAgICAiIy4q JCIgY29udGFpbnM9cHl0aG9uVG9kbw0Kc3ludGF4IG1hdGNoICAgcHl0aG9u TnVtYmVyICAgICAgICAgICAgICItXD1cPFxkXCtcKFwuXGQqXClcPVwoW2VF XVxkXCtcKVw9Ig0Kc3ludGF4IG1hdGNoICAgcHl0aG9uTm9uZSAgICAgICAg ICAgICAgICJcPE5vbmVcPiINCnN5bnRheCBtYXRjaCAgIHB5dGhvblNlbGYg ICAgICAgICAgICAgICAiXDxzZWxmXD4iDQoNCnN5bnRheCBzeW5jIG1hdGNo IHB5dGhvblN5bmMgZ3JvdXBoZXJlIE5PTkUgIik6JCINCnN5bnRheCBzeW5j IG1heGxpbmVzPTEwMA0KDQppZiAhZXhpc3RzKCJkaWRfcHl0aG9uX2hpZ2hs aWdodCIpDQogIGxldCBkaWRfcHl0aG9uX2hpZ2hsaWdodCA9IDENCiAgaGln aGxpZ2h0IGxpbmsgcHl0aG9uU3RhdGVtZW50ICAgICAgICAgICAgICAgIFN0 YXRlbWVudA0KICBoaWdobGlnaHQgbGluayBweXRob25UeXBlICAgICAgICAg ICAgICAgICAgICAgVHlwZQ0KICBoaWdobGlnaHQgbGluayBweXRob25EZWZp bml0aW9uICAgICAgICAgICAgICAgVHlwZQ0KICBoaWdobGlnaHQgbGluayBw eXRob25Db25kaXRpb25hbCAgICAgICAgICAgICAgQ29uZGl0aW9uYWwNCiAg aGlnaGxpZ2h0IGxpbmsgcHl0aG9uUmVwZWF0ICAgICAgICAgICAgICAgICAg IFJlcGVhdA0KICBoaWdobGlnaHQgbGluayBweXRob25TdHJpbmcgICAgICAg ICAgICAgICAgICAgU3RyaW5nDQogIGhpZ2hsaWdodCBsaW5rIHB5dGhvbk9w ZXJhdG9yICAgICAgICAgICAgICAgICBPcGVyYXRvcg0KICBoaWdobGlnaHQg bGluayBweXRob25DbGFzcyAgICAgICAgICAgICAgICAgICAgQ2xhc3MNCiAg aGlnaGxpZ2h0IGxpbmsgcHl0aG9uRnVuY3Rpb24gICAgICAgICAgICAgICAg IEZ1bmN0aW9uDQogIGhpZ2hsaWdodCBsaW5rIHB5dGhvblByZVByb2MgICAg ICAgICAgICAgICAgICBQcmVQcm9jDQogIGhpZ2hsaWdodCBsaW5rIHB5dGhv bkNvbW1lbnQgICAgICAgICAgICAgICAgICBDb21tZW50DQogIGhpZ2hsaWdo dCBsaW5rIHB5dGhvbkRvY1N0cmluZyAgICAgICAgICAgICAgICBDb21tZW50 DQogIGhpZ2hsaWdodCBsaW5rIHB5dGhvblRvZG8gICAgICAgICAgICAgICAg ICAgICBUb2RvDQogIGhpZ2hsaWdodCBsaW5rIHB5dGhvbk51bWJlciAgICAg ICAgICAgICAgICAgICBOdW1iZXINCiAgaGlnaGxpZ2h0IGxpbmsgcHl0aG9u Tm9uZSAgICAgICAgICAgICAgICAgICAgIENvbnN0YW50DQogIGhpZ2hsaWdo dCBsaW5rIHB5dGhvblNlbGYgICAgICAgICAgICAgICAgICAgICBDb25zdGFu dA0KICBoaWdobGlnaHQgbGluayBweXRob25DYWxsICAgICAgICAgICAgICAg ICAgICAgQ2FsbA0KZW5kaWYNCg0KbGV0IGI6Y3VycmVudF9zeW50YXggPSAi cHl0aG9uIg0K --0-752675476-985610362=:570-- From gward@mems-exchange.org Mon Mar 26 21:10:49 2001 From: gward@mems-exchange.org (Greg Ward) Date: Mon, 26 Mar 2001 16:10:49 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: ; from ping@lfw.org on Mon, Mar 26, 2001 at 04:39:23AM -0800 References: <20010326104236.D10854@mems-exchange.org> Message-ID: <20010326161048.C11145@mems-exchange.org> On 26 March 2001, Ka-Ping Yee said: > This is pretty surprising. I use Vim and it has never had any trouble > with colourizing this kind of stuff. It was even very easy to classify > docstrings (based on their position in the code) separately from ordinary > literal strings. Python's syntax makes this easy; you just look for a > colon at the end of the previous line. Surely it must be possible for > Emacs to do the same, since elisp is so much more powerful than the > pattern language Vim uses for configuring colourization modes -- it's > annoying to have Python code littered with all of these font-lock hints. [...off-topic and getting worse...] You know, the more I think about it, the more I think Emacs is the Perl or TeX of editors: hairy, overgrown, and too big and complicated for any ordinary mortal to grasp. Probably the fact that Elisp is so much more powerful is part of the reason that most Emacs modes just can't seem to get it right -- probably Elisp is *too* powerful (or *too* complicated, take your pick). IOW, Elisp is the problem, not the solution. Amusing anecdote [even more off-topic]: one of the stated reasons that the package delimiter changed from ' to :: in Perl 5 was so that Emacs wouldn't get confused. (Although why $foo'bar was ever considered a good way to denote variable 'bar' in package 'foo' is beyond me...) Back to your ordinary doc-sig... and turn off your flamethrowers, I'll probably keep using Emacs until they pry the keyboard from my cold, dead fingers... (or until somebody invents Pymacs ;-) Greg From Juergen Hermann" Message-ID: On Mon, 26 Mar 2001 15:53:17 +0100, Tony J Ibbs (Tibs) wrote: >Erm, "Arguments:" and "Returns:" (the last is an "I think", 'cos I don't >tend to use it) Hmmm, I think you mean "Argumente:" and "R=FCckgabewert:". :> Ciao, J=FCrgen From tony@lsl.co.uk Tue Mar 27 08:41:37 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 27 Mar 2001 09:41:37 +0100 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Message-ID: <001001c0b699$bc997900$f05aa8c0@lslp7o.int.lsl.co.uk> Juergen Hermann wrote: > On Mon, 26 Mar 2001 15:53:17 +0100, Tony J Ibbs (Tibs) wrote: > >Erm, "Arguments:" and "Returns:" (the last is an "I think", 'cos I > > don't tend to use it) > > Hmmm, I think you mean "Argumente:" and "Rückgabewert:". :> Strangely enough, the reason docutils (my STpy implementation) uses a dictionary to "translate" these terms is so that non-English writers can have something sensible as an alternative (although probably not in the alpha). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Tue Mar 27 09:58:50 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 27 Mar 2001 10:58:50 +0100 Subject: [Doc-SIG] Where to go from here In-Reply-To: <200103261618.f2QGIrp12424@gradient.cis.upenn.edu> Message-ID: <001101c0b6a4$860c51e0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Loper wrote: > Guido's input has raised some questions about whether we're going > in the right direction... To say the least. One can't help feeling that he might have objected sooner if he were going to object so much (like, for instance, last time round the loop when ST was decided on). Grump, moan, whine. > But he's made it clear that he'll keep an open mind, > and seriously consider any real specs we come up with. So long as we gird our loins and don't just give up - I was feeling pretty dispirited about all of this yesterday (I'm sorry, Guido, but to be told "this must be bad because it shares part of the name of something else" is not polite - it's exactly like saying one had a bad experience with regexp so one doesn't like sre). Anyway, putting my rational head back on: > So I propose we do the following: > > - continue trying to come up with a concrete, formal spec Agreed. I *still* don't think we're far off one, Guido's pessimism despite (and he still hasn't had a chace to *look* at what we've been doing). > - drop the idea of maintaining compatibility with STNG, for > now. We'd all like that. It disturbs me a little that we can consider it so easily, though, given how powerful the arguments for keeping STClassic and STNG compatibility were in the past. > Once STNG sees how much cooler our markup language > is, we can convert them. ;) Maybe we should even come up > with a new name, so that other people who have become > embittered with STclassic won't take it out on us. :) and so Guido won't be prejudiced because of the name (sorry, I'll try to stop grumping). Of course, the obvious name would be "pydoc", but that's rather taken... Perhaps we should choose "pytext", by analogy with the grandfather format, setext. > - STminus will focus purely on coming up with a formal > description, and drop its goals of unifying STNG/STpy. Makes sense. So URIs are now delimited by '<..>', yes? > - Focus on the goal of making a *real* markup language that is > *lightweight* and simple to read/write. But you are not going to get a real markup language (for any sense of "real" that I understand) if your start and end delimiters are the same - and I don't see how we can compromise on that. We still have the problem that if we don't have *really* lightweight markup, people won't do it, and that something akin to what people do in email (i.e., something like ST/STpy, I'm afraid) is the best bet for that - unless you're proposing to start the discussion that started back in 1997 all over again. > - Once we have a real specification (hopefully in a couple > weeks), we can talk to Guido/others about whether it's > acceptable. It's unreasonable to expect Guido to make > judgements when the ST stuff is in the state of flux it's > in now. I agree we need to have a specification - that's what we've been working towards. But I think the correct is still a PEP, and Guido is only one of the people who vote on PEPs. He's certainly the *most important* person, but I can't (I refuse to) believe that he doesn't change his mind on occasion. I am *very* scared that the last time round the Doc-SIG loop got this close to having something that worked, and got kiboshed by Spam8. Let's (please) not let that happen again. > Dropping STNG compatibility will allow us to consider a number > of options that I hadn't brought up before.. For example, I think > we might want to replace '--' with '---' as the description list > indicator, since people *do* use '--' in text (I know I do, and > apparently Guido does too). And I think we should drop 'o' as > a bullet character. etc.. I agree with dropping 'o' - can we add '+' as an alternative? Personally I dislike ' --- ' as the descriptive list delimiter, but not enough to jump up an down too much - and since you're also wanting to use '--' as a hyphen (presumably actually an m-dash), I'll go for it. Descriptive lists *are* meant to stand out, after all. One problem is that Guido's style guide suggests using ' -- ' for descriptive lists:: Keyword arguments: real -- the real part (default 0.0) imag -- the imaginary part (default 0.0) so we'd need to get him to change that (presumably not a problem if the PEP were accepted). > As for colorization, java-mode does just fine colorizing javadoc > comments, so I don't see how it's a problem *in principle*, just > a problem of someone figuring out what to tell emacs (I'm sure > emacs could be told to colorize tripple-quoted-strings correctly > if someone really wanted to figure out how to.. See http://www.python.org/emacs/python-mode/faq.html for an explanation of the problems. > I've just been using the work-around of backslashing > all double quotes in tripple-quoted-strings, which > doesn't affect their value, and makes them colorize > correctly) Hmm. Ugly, but worth mentioning as a tip (although single quotes can cause problems too). Peter Funk suggested: > Especially heading recognition in ST sucks. I dislike intensely headers in STClassic and STNG. We *might* be able to ignore the problem if we are simply addressing docstrings. Alternatively, there are two other ways to do it (one lightweight and hacky, the other heavyweight and, well, different): 1. Assume that a heading will be underlined. Text after a heading need not be indented any more than it normally would. I think this was suggested by David Goodger. For instance:: This is heading 1 ================= This text is within "heading 1"'s section. This is subheading 2! --------------------- Which introduces a subsection. This is subsubheading 3... ~~~~~~~~~~~~~~~~~~~~~~~~~~ And that is surely enough depth to satisfy anyone using docstrings... 2. Provide "proper" sectioning commands - for instance:: Section 1: Its title And some text Subsection: It can decide its number One would also provide other appropriate "names" for sections. Option 1 seems to me more appropriate for docstrings, option 2 for longer texts - so since we're working on docstrings, I'd go for option 1. Details to be worked out are whether one needs to get the number of underline characters right or not (!). Peter Funk also wrote: > I think, a description list can be dropped alltogether. > At least for the time being a bullet list will be enough. > enumerated lists: ...hmmm... I think we can also live without > them for a try. No and no. I vehemently disagree. And Guido's suggestion that: > Yes! They are darn ugly in HTML anyway. is just plain silly - for a start, that's almost entirely down to using default settings with poor browers (OK, IE and Netscape!), and secondly it's *definitely* controllable by writing extra HTML code, or (horrors) using style sheets. Let's not let the presentation of one format drive our whole effort. I *do* sort-of agree with Guido's point that it is a bad thing to lose the "number" from an enumeration, though - the reason for my making this optional in STpy was purely I think it may be difficult in HTML (and that's somewhat more than a presentation issue). But I've always worried about it, because of one's wish to refer back to list items by sequence number in the surrounding text. I think this one needs thinking about. Guido said: > Please do look at the conventions in MoinMoin an another example! Hmm. Last time I looked at MoinMoin I got no further than the "traditional" use of multiple quotes to mean different things, and gave up (despite the fact I rather like how it looks through a browser). Hmm. It's a mishmash of odds and ends, not designed to be read *as text*. I'm a bit disturbed that Guido refers us to this as something worth following up, since it seems to miss the point of what we're trying to do (which is *not*, in the first instance, at least, to support a Wiki). (From a very quick scan: possibly good ideas: they allow internal indentation in a paragraph to have meaning. Not sure what *use* it is, but it's fun. good ideas for a Wiki: they allow one to "optimise" URIs that end in .jpg or .gif so that the image is included instead. Not so useful in docstrings) Peter Funk continued: > I think we should aim for *very* minimalistic set of features > and people may than add other things lateron: > * emphasizing of *single* words. Edward had suggested that. I emphasise more than one word too often in my writing to be happy with that, and I don't see it as being a problem (if you're working from ideas of how STClassic does it, please don't!). > * section headings (marked up through underlining with a line of > hyphens or '=' and preceeded by a blank line). Ah - I hadn't read that far - we agree, more or less. > * bullet item lists (which may be nested through indentation). Module I want all three list types. > * References to URLs, to Mailaddresses and to Python objects. "Mailaddresses"? I assume you mean "mailto:", which is just one form of URI. We had been leaving references to Python objects to later as slightly harder and a "pydoc" type issue (one of the reasons for marking up Python words inline is to make this easier to get right). > * pre formatted paragraphs for code examples, tables and such: > (every paragraph with mixed indentation or which starts with > the patterns '>>>' or '+--' should be left allone. Only properly > aligned normal text paragraphs should allowed for reformatting. This is too complex. The concepts we already had in STpy were enough - i.e., using the '::' idea to introduce literal blocks and '>>>' to introduce doctest blocks. Trying to "guess" based on mixed case is too error prone (gods, is that error prone!), and I for one refuse to countenace needing to put some strange characters in front of literal text just to make it literal - yuck. > Than let's try to implement this minimal set and plug this > into Ping's pydoc and see what comes out, if running this > on existing sources. Well, we just about *have* implementation of our format (if people didn't keep arguing/discussing it would probably have been out already!). And there *is* precedence for changing the existing source documentations, if we need to (although my own experience with a few modules so far is that this is not a problem). *I* would prefer to keep a more complex thing that includes '[localrefs]' ('cos they're useful - and if we can't handle a PEP we're not doing too well). The "labelled paragraphs" thing is too useful for me to want to give up immediately, as well. I'm undecided about whether we need two forms of emphasis - Edward, what is your feeling, should we drop '*..*'? I get left with: * Structuring is still available by indentation, as before, although its use is less important (and to those who've only just joined us, don't worry, it does come out in the wash, honest) * Blank lines delimit paragraphs, as before * List items start paragraphs, as before * Headings are done by underlining (three levels of heading) * doctest blocks are introduced by >>> * literal blocks are introduced by :: on the previous paragraph, as before * Emphasis is by *..* (dropping **..**, which makes life simpler). Obviously, one can't nest *..* inside *..* * Literals are '..' and #..# (as before) * URIs are delimited by <..> * URI "text" is done as "some text": - note that under the new scheme we can safely allow optional whitespace after the colon, which allows friendlier typing * Standalone URIs (e.g., ) get rendered as they look * Local references are like [this], and refer to '..[this]' anchors as before (which need to be at the start of a paragraph - an anchor at the start of a line should also start a new paragraph) * Labelled paragraphs are available - for instance:: Author: Guido Arguments: #fred# --- the fred'th dimension as before. I think these are too useful to drop. Edward - have I forgotten anything? What do *you* think? You propose continuing with STminus (under a new name - pytext-slim?) with proper formal definition and minimal markup. I see that as a continuation of what was being done, but without the ST constraints. I would propose amending docutils to implement my proposals above (modulo keeping compatibility with your work), and rewrite STpy.html as pytext.html (pytext-fat.html?), describing it. Mainly because it's just about there, and will provide a "test bed" to allow people to think about which features are actually wanted. That gives us two strings to our bow, which I think is still a good idea. I propose that we still maintain output to a DOM tree, as discussed elsewhere. Meanwhile, I think we should have hope that we can convince Guido that he (honestly) was using a bad tool, and that he shouldn't judge "superficially similar" things by said tool. [[[I'm still yea-close to an alpha release of docutils. It's not put much farther off by the new considerations, so I still propose to put together a PEP for an altered version. But it's not going to be this week. I may be able to take some time off next week, which would help.]]] Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Wed Mar 28 11:01:47 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 28 Mar 2001 12:01:47 +0100 Subject: [Doc-SIG] New document - pytext-fat In-Reply-To: <001101c0b6a4$860c51e0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <001101c0b776$7ba21e60$f05aa8c0@lslp7o.int.lsl.co.uk> Hmm - either our email reception has gone wrong, or everyone is quietly working away in the background (or catching up on sleep). I spent last night (sort of literally) and a bit of this morning writing a first draft of the "fat" pytext specification. It can be found at: I apologise for the mistakes that are bound to be therein - I haven't exactly had time to reread it, and it was written in two long sessions, mostly from memory. It gets a bit thin towards the end, in the colourisation section, due to time constraints. I definitely don't guarantee to have got the DOM definitions right - Edward and I were working towards agreement on that, and I just didn't have time to refer to what we'd done so far. I propose to amend docutils to support the format documented in fat.html, with command line options to allow "obvious" experimentation, so that people can play with different approaches, and also enable/disable things (so, for instance, one should be able to disable locarefs, anchors and labels, if one wishes). (The main thing to add is the proposed new header syntax). That seems a useful "niche" for docutils to fill, whilst Edward works on pytext-slim and the corresponding tool. I hope to expand on *reasons* for decisions at some point, and I have a sneaky feeling that an annotated "history" of the Doc-SIG might be a useful resource apres-PEP - I'll volunteer to try to get round to that as well, since I were a participant for much of it (it's basically a cut-viciously-and-paste-and-edit job on the Doc-SIG archive). Sleepily, Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Wed Mar 28 15:30:34 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 28 Mar 2001 10:30:34 EST Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Your message of "Sat, 24 Mar 2001 15:37:52 EST." <200103242037.PAA28743@cj20424-a.reston1.va.home.com> Message-ID: <200103281530.f2SFUYp29461@gradient.cis.upenn.edu> > > * Intuitiveness: [...] > > * Ease of use: [...] > > Of course. This is all apple pie and motherhood. nobody will want > documentation that's unintuitive or hard to use! Not necessarily -- Many people use Javadoc, and it's not "easy to use"; and I would argue that LaTeX is not "intuitive," as I defined it.. These design goals have specific consequences, like ruling out "heavy-weight" formalisms... > > * Expressive Power: The formatting conventions must have > > enough expressive power to allow users to write the API > > documentation for any python object. > > I've never found that plaintext got in the way of my expressiveness. It depends on whether you want to express things to people or to computer tools. People are very good at reading plaintext docstrings, and getting the appropriate info out of them. But that doesn't mean it's easy to write a tool to do the same.. I believe that formatting conventions should have enough expressive power, for example, to distinguish between regions that can be rendered in non-monospaced font, & word wrapped, and those that should be rendered as "literal." > > * Simplicity: [...] > More motherhood. Again, I disagree. Neither HTML nor LaTeX is *simple*. > > * Safety: No well-formed formatted documentation string should > > result in unexpected formatting. This goal derives from > > intuitiveness. > This is a good one. ST loses big here! Well, at least STclassic and STNG. That's the reason why I would vehemently oppose using them. > I though Javadoc was geared too much towards generating HTML; we > should not focus too much on HTML. It was initially geared towards generating HTML, although they have tools to render it in LaTeX, emacs info files, framemaker, etc. Most of these tools work by requiring that you not use arbitrary HTML tags in your docs, but just limit yourself to a limited set (,
    ,
    ,
      , etc.. usually 15-50 tags, depending on the tool). Also, there are 2 orthogonal features of Javadoc: 1. their ability to use HTML in comments (which I don't think we should adopt) 2. their ability to mark special values, using forms like: @param(x) description of x... @throw(y) description of when y is thrown.. I think we should have something like (2), although it might be more pythonic to do something like: argument x: description of x throw y: description of y or: arguments: x -- description of x y -- description of y (incidentally, this use is the main reason that I support DLs in documentation strings..) -Edward From edloper@gradient.cis.upenn.edu Wed Mar 28 15:34:54 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 28 Mar 2001 10:34:54 EST Subject: [Doc-SIG] Formalizing ST In-Reply-To: Your message of "Sun, 25 Mar 2001 11:42:47 +0200." Message-ID: <200103281534.f2SFYsp29847@gradient.cis.upenn.edu> > > Hmm.. So I'm starting to think that EBNF really isn't the best > > formalism for capturing global formatting. > > Hmmmm..... I think I have to disagree. What is global formatting? > Did you ever had a look at the Python/Grammar/Grammer file, which > is basically EBNF and uses the special Tokens INDENT and DEDENT? > [more info, pointers] Thanks for the pointers. Something like this might work, although I'm still not sure how it will work for literal blocks. But I'll figure that out. I was trying to express things in streight EBNF, not using magic tokens like INDENT and DEDENT, but maybe those will help. This may end up requiring that paragraphs be indented reasonably.. (currently, you can indent paragraphs like this if you want:) This paragraph is indented 8 spaces because its first line is indented 8 spaces, regardless of the subsequent indentations. Most of us think that this is not a useful feature. Of course, we may be ditching indentation-structure, anyway (at least for anything other than lists, and maybe literal blocks..) :) -Edward From edloper@gradient.cis.upenn.edu Wed Mar 28 15:44:01 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 28 Mar 2001 10:44:01 EST Subject: [Doc-SIG] using the same delimiter on the left and right.. Message-ID: <200103281544.f2SFi2p01299@gradient.cis.upenn.edu> Tibs seems to have this strange notion that "real" markup languages don't use the same characters for left and right delimiters.. :) But almost any markup I can think of does... '$' or '$$' in LaTeX for math mode.. '"' in XML/HTML for attribute values.. etc. I think that what makes delimiters in ST seem like not-real-markup is that they are context- dependant. E.g., "'" in the middle of a word is different from "'" at the beginning of a word. So.. let's change that. Let's make all of our delimiters into real delimiters, that can only be used for delimiting (or maybe also for bullets, in the case of '*'). We could switch our "literal" delimiter to "`". So then we would have the following reserved characters, that may not appear in text without being quoted somehow: '<' left delimiter for URLs '>' right delmiter for URLs '#' delimiter for inlines '`' delimiter for literals '*' delimiter for emph, maybe for strong. '::' marker for literal regions Then the only context-dependant characters that remain would be start-list-item characters.. And if we wanted to, we could use '* ' at the beginning of any list item, since it's reserved anyway... something like: * this is an unordered list item *1. this is an ordered list item Well.. I'm not sure whether we'd want to do that or not.. We may be happy with just using '1.' and assuming that no one will start a line with a number that ends a sentence.. But I think that reserving the delimiter characters might still be a good idea.. Does this sound like a reasonable direction to go? It at least seems to me to be closer to a "real" markup language.. -Edward From edloper@gradient.cis.upenn.edu Wed Mar 28 15:55:35 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 28 Mar 2001 10:55:35 EST Subject: [Doc-SIG] What's important in a docstring markup language? Message-ID: <200103281555.f2SFtZp02258@gradient.cis.upenn.edu> So, I was thinking about how to make ST more formal, more like a real markup language, and possibly more lightweight.. And that led me to think about what I *really* want in a markup language for docstrings.. The following is an ordered list of the features I think it should have. I.e., I think features earlier on the list are more important, and ones later are less important. I would personally be happy to draw a line anywhere after 4, and forget about anything under the line. :) 1. The ability to distinguish text that can be rendered with soft spaces and word wrapped from text that should be rendered in monospace with whitespace preserved (i.e., the ability to distinguish natural language from everything else). This includes both inline literals and literal blocks 2. The ability to label the semantic content of parts of descriptions, eg., as the return value or as a description of an argument. 3. The ability to properly handle doctest blocks (this is a high priority, because these have become standard) 4. Unordered lists 5. Ordered lists 6. Sections 7. Hierarchical sections 8. The ability to mark a word as emphasized 9. URLs 10. The ability to mark regions as emphasized 11. The ability to mark regions as strong 12. Footnotes/endnotes 13. Internal anchors/references to parts of a docstring Does this correspond more-or-less with other people's priorities? What markup do people feel is essential for documenting the APIs of python objects? -Edward From guido@digicool.com Wed Mar 28 16:59:18 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 28 Mar 2001 11:59:18 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Wed, 28 Mar 2001 10:44:01 EST." <200103281544.f2SFi2p01299@gradient.cis.upenn.edu> References: <200103281544.f2SFi2p01299@gradient.cis.upenn.edu> Message-ID: <200103281659.LAA09792@cj20424-a.reston1.va.home.com> > So then we would have the following > reserved characters, that may not appear in text without > being quoted somehow: > '<' left delimiter for URLs > '>' right delmiter for URLs > '#' delimiter for inlines > '`' delimiter for literals > '*' delimiter for emph, maybe for strong. > '::' marker for literal regions Yuck. Most of these (except '::') are quite commonly used for other purposes, and occur frequently in examples. I prefer markup languages with very few special characters, e.g. a GNU doc standard whose name I don't recall, which only uses @; or Perl's POD, which seems to get away with making only a letter followed by '<' special. Latex has at least three special characters ('\', '{', '}'), and in some contexts more, and that's already a pain. XML with '<' and '&' is borderline for me. > Then the only context-dependant characters that remain would > be start-list-item characters.. And if we wanted to, we could > use '* ' at the beginning of any list item, since it's > reserved anyway... something like: > > * this is an unordered list item > *1. this is an ordered list item This is OK, although I like the single hyphen form better. > Well.. I'm not sure whether we'd want to do that or not.. We > may be happy with just using '1.' and assuming that no one will > start a line with a number that ends a sentence.. That was ST's the original sin. > But I > think that reserving the delimiter characters might still be > a good idea.. > > Does this sound like a reasonable direction to go? It > at least seems to me to be closer to a "real" markup language.. I can't endorse this yet. --Guido van Rossum (home page: http://www.python.org/~guido/) From Juergen Hermann" Message-ID: On Wed, 28 Mar 2001 10:34:54 EST, Edward D. Loper wrote: > This paragraph is indented 8 spaces because its first > line is indented 8 spaces, regardless of the subsequent > indentations. Most of us think that this is not > a useful feature. > >Of course, we may be ditching indentation-structure, anyway (at >least for anything other than lists, and maybe literal blocks..) I know this comes a little late, but have you guys considered wiki markup and the python code that exists for it? Many things that are/were fishy in ST are clearly defined in common wiki markups. Ciao, J=FCrgen From akuchlin@mems-exchange.org Wed Mar 28 18:54:35 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 28 Mar 2001 13:54:35 -0500 Subject: [Doc-SIG] Graphics in the docs Message-ID: Two questions forwarded from someone who's working on a HOWTO: >How do I have include graphics in the HOWTO ? > ( I'd like to show the result of my hello, world, etc..., >I have manage to include that in the HTML output but not in PDF... ) >Can I boldface certain parts of python code ( to mark >differences...) >in the verbatim environnement... ( I don't know how to do through) Fred, any suggestions? (I certainly have no objection to doing either of those things in the HOWTOs. They'd make conversion to DocBook or other DTD complicated, but I don't think the HOWTOs are prime candidates for conversion to begin with.) --amk From pf@artcom-gmbh.de Wed Mar 28 18:51:45 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 28 Mar 2001 20:51:45 +0200 (MEST) Subject: [Doc-SIG] Formalizing ST In-Reply-To: from Juergen Hermann at "Mar 28, 2001 8:10:59 pm" Message-ID: Hi, Juergen Hermann points to MoinMoin/parser/wiki.py: > I know this comes a little late, but have you guys considered wiki > markup and the python code that exists for it? Many things that > are/were fishy in ST are clearly defined in common wiki markups. Last weekend I installed MoinMoin 0.8 here on a Server in our companies intranet and played around with the markup. Wiki markup contains some clever ideas but IMO this is not really intuitive markup useful for Python inline doc strings. For example, Headlines in MoinMoin wiki markup are entered as: = This is a very important H1 chapter headline = == This is a slightly less important H2 section headline == === This is a least of all important H3 subsection headline === This sucks IMO, since it emphazises the umimportant headings in favour of the important ones, when viewing the text in an editor. I would prefer to use indentation to markup different levels (borrow this idea from ST) and use simple underlining for marking single lines as headings: This is a very important H1 chapter headline -------------------------------------------- This is a slightly less important H2 section headline ----------------------------------------------------- This is a least of all important H3 subsection headline ------------------------------------------------------- IMO the ''' and '' for Mixing ''italics'' and '''bold''' are also unreadable in Text editors. They conflict with Pythons triple quotes used in Docstrings BTW. I like the *emphasize* proposed in this group and by ST better. However the url detection without requiring '<' and '>' delimters around the http:// ... string is a nice feature of MoinMoin markup. Ping has implemented something similar in pydoc already and this works just fine. I have a similar feeling with the email address recognition (MoinMoin uses the regular expression [-\w._+]+\@[\w.-]+ for that). The use of `{{{' and `}}}' is also too heavy for the purpose. pydoc currently assumes that everything but URLs is preformatted literal material and uses a fixed font to display everything. I believe we can loosen this a bit a consider every paragraph with mixed indentation as literal material. So only a paragraph with properly indented lines will be recognized as text paragraph and possibly reformatted. About lists and numbered lists I'm still not sure what I would like. I bullet item list (LaTeX itemize) seems to be enough for most cases. A few days ago Guido gave a similar statement. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From klm@digicool.com Wed Mar 28 20:35:45 2001 From: klm@digicool.com (Ken Manheimer) Date: Wed, 28 Mar 2001 15:35:45 -0500 (EST) Subject: [Doc-SIG] going awry Message-ID: Darn. We've had a number of occasions where the doc-sig has launched into an effort to formulate doc string conventions, and took a turn to invent a new language - which gets lost in the ether. You seemed to be getting some good progress on fixing the flaws in an existing language - structured text - but it sounds eerily like you're heading towards throwing that out the window, and inventing a new language. I think that's a shame. I think a large part of guido's objections to structured text have to do with battling painful implementation bugs, part to do with lack of predictability, and part do with an expectation that rich-text markup style is going to take over the world, even day-to-day communications. I think the implementation-specific problems can be fixed by the efforts we were seeing. I think that's now in danger of being derailed, to be replaced by another (how many years has this happened?) invent our own. (Perhaps i'm overstating it - maybe what's happening now is more about trimming down from a successful example, which should not be near as prone to getting off track.) I think the expectation for use of rich-text markup style is misguided. There may be tons of day-to-day email out there in html format - but i'd lay high odds that, excluding marketing spam, the vast majority uses no markup at all. (When really in on-the-fly communication mode, regular people just type, they don't use menus. They may resort to *punctuation* to express formatting, but they'll rarely resort to codes. IE and netscape may package people's messages in delightful mime plain-text/html packages, but i expect that the vast majority of the time it's unnecessary.) I hope, if you do try to invent a new language, you'll exploit some of the economies and principles that structured text has demonstrated... Ken klm@digicool.com From Lucas.Bruand@ecl2002.ec-lyon.fr Wed Mar 28 21:35:50 2001 From: Lucas.Bruand@ecl2002.ec-lyon.fr (Lucas Bruand) Date: Wed, 28 Mar 2001 23:35:50 +0200 Subject: [Doc-SIG] (no subject) Message-ID: In Documenting Python, it is written page 11 : > \url{url} > A URL (or URN). The URL will be presented as text. In the HTML and PDF formatted versions, the URL will >also be a hyperlink. This can be used when referring to external resourc= es. Note that many characters are special >to L A T E X and this macro does not always do the right thing. In particular, the tilde character (=91=98=92) is mis-handled; >encoding it as a hex-sequence does work, use =91%7e=92 in place of the t= ilde character. I don't understand what I should exactly write instead of tilde. ( becau= se %7e counts as a remark nor does \symbol{"7e}) Thank in advance for helping a beginner in latex, Lucas Bruand From mal@lemburg.com Thu Mar 29 08:21:33 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 29 Mar 2001 10:21:33 +0200 Subject: [Doc-SIG] going awry References: Message-ID: <3AC2F08C.6B1D0E3B@lemburg.com> Ken Manheimer wrote: > ... > I think a large part of guido's objections to structured text have to do > with battling painful implementation bugs, part to do with lack of > predictability, and part do with an expectation that rich-text markup > style is going to take over the world, even day-to-day communications. Please let me get this straight: as far as I understood Guido's post, he only mentioned that rich text markup didn't work in out in his projects -- he never outruled rich text markup for general use, so I suspect all this confusion to be based on a misunderstanding. There are people out there who use richt text markup in doc-strings today, so I don't think that we should stop talking about a standard for a format. Ideally, there should be an interface for extracting information from single doc-strings or maybe even modules which then lets everybody plug in their own favourite doc-string parser. We already have tons of different auto-doc tools out there in the Python universe -- the problem with most of them is that they do not allow for parser plugins. This should change, IMHO. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From pf@artcom-gmbh.de Thu Mar 29 08:48:02 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Thu, 29 Mar 2001 10:48:02 +0200 (MEST) Subject: [Doc-SIG] f(...) vs. f (...) inconsistency Message-ID: Hi, this is a very tiny issue, but it bugged over the years: In his Styleguide Guido wrote: """I **hate** whitespace in the following places: [...] Immediately before the open parenthesis that starts the argument list of a function call, as in spam (1). Always write this as spam(1).""" I agree to 100% with this. On the other hand I very often cut'n'paste between the library reference manual pages and an open editor window. Unfortunately in the library reference there are spaces between function names and the opening parenthesis, which I always have to remove manually. How come? Should the library documentation be fixed in this regard? Regards, Peter From tony@lsl.co.uk Thu Mar 29 09:08:48 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 10:08:48 +0100 Subject: [Doc-SIG] What's important in a docstring markup language? In-Reply-To: <200103281555.f2SFtZp02258@gradient.cis.upenn.edu> Message-ID: <001e01c0b82f$dd7af350$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > The following is an ordered list of the features I think it should > have. I.e., I think features earlier on the list are more important, > and ones later are less important. I would personally be happy > to draw a line anywhere after 4, and forget about anything under > the line. :) > > 1. The ability to distinguish text that can be rendered with soft > spaces and word wrapped from text that should be rendered in > monospace with whitespace preserved (i.e., the ability to > distinguish natural language from everything else). > This includes both inline literals and literal blocks > 2. The ability to label the semantic content of parts of > descriptions, eg., as the return value or as a description > of an argument. > 3. The ability to properly handle doctest blocks (this is a high > priority, because these have become standard) > 4. Unordered lists > 5. Ordered lists I think those are in my top six as well. > 6. Sections In docstrings, which are typically short, I think these are at the end of the list > 7. Hierarchical sections Almost no need for these in docstrings, and I think 2 (or at worst 3) levels is more than enough > 8. The ability to mark a word as emphasized I don't want 8 at all, I want 10. > 9. URLs Quite important if this is meant to be "joined up" documentation (to use a horrible buzz phrase our politicians seem addicted to). It is very important to be able to reference documentation elsewhere, and having them "clickable" in derived formats that support that ability is important too. > 10. The ability to mark regions as emphasized See above. I would place this in my top half a dozen. > 11. The ability to mark regions as strong Don't care - in docstrings, I think one form of emphasis is enough (ref fat.html) > 12. Footnotes/endnotes Useful, very useful, but not essential > 13. Internal anchors/references to parts of a docstring A frill, a frippery, something to forget until someone comes up with a use case. > Does this correspond more-or-less with other people's priorities? > What markup do people feel is essential for documenting the APIs of > python objects? So I guess my list is: 1. "plain" text versus "literal" text (as you say, inline and by the block) 2. "Python" inline versus "other literal" inline. 3. Emphasis (so I don't have to SHOUT A LOT) 4. doctest blocks, 'cos they're easy and useful 5. Lists - all three types - I use all three a lot 6. Label blocks (those things like "Arguments" and so on, which contain a particular sort of information) Those six are my core. Their order isn't too important, 'cos I want all of them (I guess 6 is slightly less important) 7. URI detection This lives by itself, and nearly makes the "prime" list. 8. Footnotes/endnotes 9. Headers and sections I'm not sure about the order of those last two. Mind you, I think this is a useful excercise (with luck, it won't tell us anything new, of course!). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 29 09:16:31 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 10:16:31 +0100 Subject: [Doc-SIG] going awry In-Reply-To: <3AC2F08C.6B1D0E3B@lemburg.com> Message-ID: <002101c0b830$f18af290$f05aa8c0@lslp7o.int.lsl.co.uk> M.-A. Lemburg wrote: > Please let me get this straight: as far as I understood Guido's > post, he only mentioned that rich text markup didn't work in out > in his projects -- he never outruled rich text markup for general > use, so I suspect all this confusion to be based on a > misunderstanding. Guido said: > PS. Don't spend too much time trying to make StructuredText or some > variation thereof work. In my experience with systems that use ST > (like ZWiki), it sucks. There basically are two options I like: > nicely laid out plain text, or a real markup language like Latex or > DocBook. That seems to be a downer for anyone trying to produce a docstring markup format. Of course, I think he has been soured by one particular implementation of something that wasn't what we were proposing (and given he seems to like MoinMoin which has an even more ad-hoc approach to text and what it "means", I don't *quite* understand why he's so down on even STClassic). > Ideally, there should be an interface for extracting > information from single doc-strings or maybe even modules which > then lets everybody plug in their own favourite doc-string parser. > > We already have tons of different auto-doc tools out there in > the Python universe -- the problem with most of them is that they > do not allow for parser plugins. This should change, IMHO. HappyDoc seems to be the leader here - I keep mentioning it partly because it seems to be under active development, partly because the author is (at least in principal) interested in the result of what we're doing, and partly because it aims to use plugins for both parsing the text and producing the output. But there is a problem with *recognising* what markup is used in what docstring, if there isn't one standard! Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 29 09:16:48 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 10:16:48 +0100 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: <200103281544.f2SFi2p01299@gradient.cis.upenn.edu> Message-ID: <002201c0b830$fb538670$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > Tibs seems to have this strange notion that "real" markup > languages don't use the same characters for left and right > delimiters.. :) Erm, yes, now you point it out I clearly had my stupid hat on (I wish I could lose that damn thing, it's so embarrassing). > I think that what makes delimiters > in ST seem like not-real-markup is that they are context- > dependant. E.g., "'" in the middle of a word is different > from "'" at the beginning of a word. Well, personally (despite anything I might have said before now) I'm going to start declaring that the ST family *is* real markup (thus defining "real" appropriately, of course). I've begun to think that otherwise it sounds silly. (some of you may want to skip the rant that follows to see more interesting stuff - I'll put a '****** BACK TO NORMAL ******' delimiter at the end so you can scan down...) What I'm clearly striving after is some way of describing what makes ST, etc., different. Thinking about this, we have: * The SGML/XML family * The TeX family * The Runoff family (including things like * The Pod family OK. The SGML family originated in a need to markup data as to its *meaning*, pure and simple. This later got spread to trying to use the meaning of a term to decide how to present it (which gives us HTML, sort of), and that becomes a slippery slope. The TeX family originate in the need to drive the precise typesetting of particular parts of the text, whilst producing good general, predictable typesetting for the rest of the text. It is important to remember that when using a TeX-related tool, the *intention* is that if it doesn't look good when formatted, then it should be rewritten (and indeed, that may mean writing different words to say the same thing). Because the meaning of a term often drives how it is to be typeset (especially in maths, it's original target), the use of TeX for semantic markup arises. The Runoff family was a simpler variant on the TeX idea, which wanted to produce computer manuals, and so on. There's generally less control over meaning, more interest in presentation. It's not clear to me if troff and so on belong to the TeX family or the Runoff family. The Pod family is, maybe, if it exists, the family of marking up docstrings. Edward Welbourne has talked about this in an earlier email. Basically, the aim is to produce something more useful than plain text (but not of a quality to stop a technical documentor wincing), leaving the original, marked up, text still useful *as such*. Eddy also comments that if someone using (in his comment, ST) is spending too much time worrying about markup, then they're not spending enough time working on more important things. Both the TeX family and the SGML family care about formalisms, a lot. They each have their own elegances which they are striving for. The Runoff family hasn't *heard* of elegance. And the Pod family are after pracicality. I think *we* are *not* in the TeX or SGML families. We are in the "pragmatic solution to a specific problem" space, and if formalism helps with that, then that's a Good Thing, but we shouldn't strain after theoretical purity lest we stray from practical usefulness (heh, I've been pulled up on the list in the past for exactly that). Sorry - back to the normal argument again... ****** BACK TO NORMAL ****** > Let's make all of our delimiters into real delimiters, > that can only be used for delimiting (or maybe also for > bullets, in the case of '*'). We could switch our "literal" > delimiter to "`". So then we would have the following > reserved characters, that may not appear in text without > being quoted somehow: > '<' left delimiter for URLs > '>' right delmiter for URLs > '#' delimiter for inlines > '`' delimiter for literals > '*' delimiter for emph, maybe for strong. > '::' marker for literal regions I hadn't thought of using backtick as a literal delimiter. In the context of docstrings, I can't see why it wouldn't work - hmm, this is a `literal` - yep, that works for me (does the resonance with Python backtick work?). It frees up both sorts of "normal" quote, which is good, and only inconveniences people like Eddy who insist on typing `both sorts of single quote' (TeX users, the lot of them). And it mean I can type "'cos" without worrying (or 'plane or 'phone if I want to appear old-fashioned). And if those *are* the delimiters, then it *would* work to expect them to be quoted when they occurred - neat. Just goes to prove why we keep Edward around on the list (please add a here). Does that mean we allow things like "there is a hard-space` `here"? It would be quite a neat thing to allow... > Then the only context-dependant characters that remain would > be start-list-item characters.. And if we wanted to, we could > use '* ' at the beginning of any list item, since it's > reserved anyway... something like: > > * this is an unordered list item > *1. this is an ordered list item > > Well.. I'm not sure whether we'd want to do that or not.. As I say elsewhere, this was considered in an earlier round, and in the end dropped. Personally, I think we're doing OK with the list forms we already had. > Does this sound like a reasonable direction to go? Well, I like it. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 29 09:16:52 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 10:16:52 +0100 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: <200103281659.LAA09792@cj20424-a.reston1.va.home.com> Message-ID: <002401c0b830$fdd85c90$f05aa8c0@lslp7o.int.lsl.co.uk> Guido van Rossum wrote (in response to Edward Loper): > > So then we would have the following > > reserved characters, that may not appear in text without > > being quoted somehow: > > '<' left delimiter for URLs > > '>' right delmiter for URLs > > '#' delimiter for inlines > > '`' delimiter for literals > > '*' delimiter for emph, maybe for strong. > > '::' marker for literal regions Hmm. Using backtick for literals might work quite well - what was ST's reason for not so doing, I wonder? > > Yuck. Most of these (except '::') are quite commonly used for other > purposes, and occur frequently in examples. And the problem with that is? > I prefer markup languages with very few special characters, > e.g. a GNU doc standard whose name I don't recall texinfo > which only uses @; or Perl's POD, which seems to get > away with making only a letter followed by '<' special. Yes, but they are not applicable by the "heavyweight markup won't fly" rule (which has been a principle of the Doc-SIG since 1997, and which you yourself defended, and which I used to oppose until it was explained Very Gently and Lots of Times to me why it was important). Texinfo (and there are other more modern examples) is still "formal markup to produce a document", where the markup has equal status with the text, and is expected to intrude. People will not want to write it in docstrings. So we'd lose. Pod is used successfully in the Perl world, and is a clear winner there. I find it intensely unreadable, as a lightweight format. One of the precepts of the whole Doc-SIG/docstring thing has been that "marked up" text must be readable *as text*. I'll say again what I seem to keep saying recently - that means that email is a sensible sort of model. If we can successfully parse something close to what people type in email, then we're onto a winner, in terms of getting people to use it. > Latex has at least three special characters ('\', '{', '}'), > and in some contexts more, and that's already a pain. > XML with '<' and '&' is borderline for me. We already have existing dictarotial fiat (first in 1997, reiterated by you again recently) against LaTeX and SGML/HTML/XML. That's a Good Thing, since the Doc-SIG as a whole has (each time round the loop) agreed that all of these are non-flyers for docstring markup. Their individual deficiencies (if so they be - that's a matter for argument elsewhere) are thus not relevant. > > Then the only context-dependant characters that remain would > > be start-list-item characters.. And if we wanted to, we could > > use '* ' at the beginning of any list item, since it's > > reserved anyway... something like: > > > > * this is an unordered list item > > *1. this is an ordered list item > > This is OK, although I like the single hyphen form better. There was a proposal last time round the loop to start all list item "sequences" with a special character (debate obviously ensued on which). It was dropped as a proposal (I can't remember which side of the debate I started out on - Doc-SIG has had a history of changing my mind towards the consensus by reasoned debate - don't you just hate it when that happens?). On the whole, I oppose it now. It makes it easier for a parser, and much harder for a human being, to write text. > > Well.. I'm not sure whether we'd want to do that or not.. We > > may be happy with just using '1.' and assuming that no one will > > start a line with a number that ends a sentence.. > > That was ST's the original sin. Is it a sin? I don't believe that you will get a markup system (*whatever* its conventions) that doesn't have *some* nooks and crannies where the user may not type. And if we're worried about (important, yes) fringe cases like that, why not make the implementation (note, not the spec) able to give a warning if it looks like the user might have done that (after all, ending a sentence, in *most* cases, can be spotted due to puntuation, so it should, often, be feasible). > I can't endorse this yet. I am worried that you, Guido, are coming into a debate which you have not participated in (note - *that* is not a criticism - there are other important things I'd like you to have been spending your time on) and putting down some ground rules which *appear* to contradict group-wisdom, as derived over the years. I'm a bit uncomfortable with having to attempt to "channel" the results of that, given I tend to be opinionated anyway, but even so. The Doc-SIG has had a disturbing habit of getting *very close* to a product, and then just petering out. This seems to partially correlate to the aftermath of a Spam meeting (frustrating if one couldn't be there), although for entirely different reasons each time, I believe (i.e., that's hopefully a red herring). I'd be very interested to know what you consider your "sticking points" on this to be - it may be that they are nothing we would worry about, it may be that they are issues we've already argued around in the past. For one thing, I'd appreciate *someone* explaining to me, slowly and with illustrations, just what is wrong with having context-sensitive markup *in docstrings* (not in abstract large documents marked up for typesetting (a la TeX), not in data specifications marked up for detailed content retrieval (a la SGML), but in docstrings marked up for humans to read the markup as text, and for software to retrieve some extra information for slightly improved presentation and for slightly improved information extraction). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 29 09:16:55 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 10:16:55 +0100 Subject: [Doc-SIG] going awry In-Reply-To: Message-ID: <002501c0b830$ff8ee630$f05aa8c0@lslp7o.int.lsl.co.uk> Ken Manheimer wrote: > Darn. We've had a number of occasions where the doc-sig has > launched into an effort to formulate doc string conventions, > and took a turn to invent a new language - which gets lost in > the ether. You seemed to be getting some good progress on > fixing the flaws in an existing language - structured text > - but it sounds eerily like you're heading towards throwing > that out the window, and inventing a new language. I think > that's a shame. I make this the third time round. It normally falls apart soon after a Spam meeting, which is *very* frustrating for those of us who can't get to them (and were involved in the debate that seemed to be so productive just before the meeting). > I think a large part of guido's objections to structured text > have to do with battling painful implementation bugs, part to > do with lack of predictability, and part do with an expectation > that rich-text markup style is going to take over the world, > even day-to-day communications. I tend to agree. I'm also disturbed that we seem to have to rehash some of the same arguments each time round the loop - it gets rather wearing. Although Guido's points are undoubtedly sensible, they are *also* points that have been made at least twice before. It doesn't help that I don't understand (myself) why people object to context sensitivity in markup in something like ST - what on earth is wrong with a single quote in the middle of a word being different than a single quote at the start of a word, or just before punctuation? We're all good at reading text - that means we don't even *see* such constructs AS SUCH - they're part of the "scanning interface" we run over lines. I mean, a person presented with 'this isn't difficult' doesn't have a particular problem with discerning that the middle quote is different than the others, and whilst I wouldn't propose *allowing* that as a quoted string in our format, it's nastier than the fringe cases people *are* worrying about. > I think the implementation-specific problems can be fixed by > the efforts we were seeing. There are some specific things about ST that *would* be nice to fix, and being free to do that (by dictatorial fiat) is a Good Thing. But I think throwing out the whole thing is not - it's been 5 years, dammit. > (Perhaps i'm overstating it - maybe what's happening now is more about > trimming down from a successful example, which should not be > near as prone to getting off track.) Maybe. I suspect this week will tell. > I think the expectation for use of rich-text markup style is > misguided. It's meant to look like the sort of thing that one already sees people typing in email, to my mind. That means that '*' is a natural character for emphasis, a quote (of some sort) needs using for quoting (and that really means single quote, since it's less used for speech), list items need to *look* like list items (although there's some freedom for playing with that), and so on. > I hope, if you do try to invent a new language, you'll > exploit some of the economies and principles that > structured text has demonstrated... Personally, I don't think ST is far off. I think that it has been trapped by some early assumptions - big deal. More in other messages... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 29 09:16:50 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 10:16:50 +0100 Subject: [Doc-SIG] Formalizing ST In-Reply-To: Message-ID: <002301c0b830$fcafe220$f05aa8c0@lslp7o.int.lsl.co.uk> Peter Funk wrote: > Last weekend I installed MoinMoin 0.8 here on a Server in our > companies intranet and played around with the markup. Wiki > markup contains some clever ideas but IMO this is not really > intuitive markup useful for Python inline doc strings. I've read the markup documentation on several Wikis (including CLisp, which is fascinating), and none of them are interested in human readable markup - they're all really interested in presenting web pages. > I would prefer to use indentation to markup different levels (borrow > this idea from ST) and use simple underlining for marking single > lines as headings: Indentation for structure is contentious with many people, and whilst it *sounds* like a good idea (especially to Python people) many object to ending up with the bulk of their text indented. > However the url detection without requiring '<' and '>' > delimters around the http:// ... string is a nice feature > of MoinMoin markup. You haven't been following the me and Edward Loper (and Edward Welbourne) flurry of emails over recent weeks, have you? The trouble with finding *bare* URIs in a text document written by humans with punctuation is that, in the general case, you can't do it. For instance, a URI is allowed to end with a dot ('.'). So how do you cope with a sentence that ends with http://www.tibsnjoan.co.uk/. Is that last do part of the URI or not? There are other issues about what can go inside the URI, as well. Yes, people can come up with ad-hoc solutions (docutils/stpy.py works reasonably well), but they are ad-hoc and not guaranteed to work. This disturbs some people (I'm not *too* fussed, but then I'd err on the side of detecting *too many* URIs, I think, which I know would upset some people). The *only* safe way (and note that this is an option in MoinMoin also) is to delimit the URIs with some mechanism, and '<..>' is at least a fairly traditional solution. > Ping has implemented something similar in pydoc already > and this works just fine. See above - it's "modulo just fine" I'm afraid (Ping is happy with approximate solutions that find too many instances - somewhat more than myself - so *of course* pydoc does what it does (and of course it should)). > I have a similar feeling with the email address recognition Erm - email addresses should be presented as URIs, honest. > About lists and numbered lists I'm still not sure what I would like. > I bullet item list (LaTeX itemize) seems to be enough for most cases. No, that is not sufficient. There are too many of us who *want* (no, *need*) more sorts of list (believe me, I've been using a too-simple internal markup tool for C function header comments for years, and it has only one type of list, delimited by '@' - it's not sufficient - people end up writing lists out "by hand", which rather circumvents the point). > A few days ago Guido gave a similar statement. I'm not sure he exactly said that, but if he did, he was wrong (it *is* possible, he just normally uses the time machine to go back and alter the records after he changes his mind). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 29 11:08:37 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 12:08:37 +0100 Subject: [Doc-SIG] Formalizing ST In-Reply-To: Message-ID: <002c01c0b840$9a88acc0$f05aa8c0@lslp7o.int.lsl.co.uk> Peter Funk wrote: > Tony J Ibbs (Tibs) schrieb: > > You haven't been following the me and Edward Loper (and Edward > > Welbourne) flurry of emails over recent weeks, have you? > > Yes, I did, but refused to jump in: Admirable restraint! (and brave man for following it all) > I believe it was somewhat theoretic. Well, a mixture of theory and pragmatism, jumbled up to be hard to distiguish. > In practice I have never ever seen a URL ending with a > period. Please give real world evidence of some useful URL. Oh, I'm not convinced that we couldn't manage with the ad-hoc use of REs that I, Ka-Ping Yee, and others have been managing with. But Edward *is* unhappy with it, and that has become significant to me as he has shown good "design sense" in other places. Obvious URIs that fail the test are "." and ".." (both perfectly legal "local" references within an HTML document, and certainly possible things for someone to want to use in a docutils context, I'd have thought - particularly in a package's __init__.py docstring). > IMO it wouldn't hurt, if detection fails in this case. The problem isn't with detection *failing*, it's partly to do with excessive detection (i.e., the pragmatic schemes generally try to over-identify URIs, just in case), but *mainly* due to a worry about explaining to a user what they can type that will work, before they type it. An explanation that goes: "type your URI, but if it ends in one of these characters, you'll have to escape it, or something, and by the way *this* ad-hoc list of characters inside your URI also needs escaping" doesn't seem to be attractive to Edward (put that way, who can blame him), whereas it's very easy to say: "if you want your URI to be recognised, highlighted as such, and with a link if the application supports it, just put '<' and '>' round it, like you're used to seeing in email headers" and expect people to remember it. We might even be able to allow *spaces* in a URI with the '<..>' scheme, which is seriously neat. > I don't suggest to forbid the '<' and '>' delimiters. > Just make them optional. Edward and I will both grumble at it being optional - he for formal reasons, and I 'cos its wasted mind space remembering optional things when you don't need to, and it gives you the worst of both worlds. > This will work just fine in at least 99.8 % of all cases. Hmm. I'd vote for either ad-hoc recognition or '<..>', and Edward makes a good case for using the latter if we're starting "from scratch". (Note that, despite your attempts to throw oars into our works (!), I'm glad to see that at least one of the disputative regulars from last time round the Doc-SIG loop is listening - please feel free to correct my "historical comments" if you think I'm getting them wrong.) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From pf@artcom-gmbh.de Thu Mar 29 11:13:21 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Thu, 29 Mar 2001 13:13:21 +0200 (MEST) Subject: On ordered Lists (was RE: [Doc-SIG] Formalizing ST) In-Reply-To: <002301c0b830$fcafe220$f05aa8c0@lslp7o.int.lsl.co.uk> from "Tony J Ibbs (Tibs)" at "Mar 29, 2001 10:16:50 am" Message-ID: Hi, I wrote: > > I bullet item list (LaTeX itemize) seems to be enough for most cases. [...] > > A few days ago Guido gave a similar statement. Tony J Ibbs (Tibs) replied: > I'm not sure he exactly said that, but if he did, he was wrong (it *is* > possible, he just normally uses the time machine to go back and alter > the records after he changes his mind). I would love to watch the time machine altering the doc-sig archive on python.org and make this email non existent in a parallel universe. :-) I meant the following 2 EMails written by Guido: In http://mail.python.org/pipermail/doc-sig/2001-March/001584.html Guido replied on an email from me: > > I think, a description list can be dropped alltogether. > > Yes! They are darn ugly in HTML anyway. > > > At least for the time being a bullet list will be enough. > > Agreed. Later in http://mail.python.org/pipermail/doc-sig/2001-March/001595.html he wrote as a reply to mailto:edloper%40gradient.cis.upenn.edu: > > Well.. I'm not sure whether we'd want to do that or not.. We > > may be happy with just using '1.' and assuming that no one will > > start a line with a number that ends a sentence.. > > That was ST's the original sin. IMO these are pretty clear statements. If INDENT and DETENT tokens are part of a upcoming EBNF docstring grammar, I think it might be possible to come up with rules for ordered and descriptive lists later on, which will not suffer from ST patterns which trigger in error. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From tony@lsl.co.uk Thu Mar 29 12:23:35 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 13:23:35 +0100 Subject: On ordered Lists (was RE: [Doc-SIG] Formalizing ST) In-Reply-To: Message-ID: <003101c0b84b$137b7590$f05aa8c0@lslp7o.int.lsl.co.uk> There seem to be a lot of follow-ups in the headers - I've left them intact just in case. Apologies to anyone who would prefer I hadn't... Peter Funk wrote: > I meant the following 2 EMails written by Guido: In > http://mail.python.org/pipermail/doc-sig/2001-March/001584.html > Guido replied on an email from me: > > > I think, a description list can be dropped alltogether. > > > > Yes! They are darn ugly in HTML anyway. Sigh. Judging a construct by how IE and Netscape present it is not a very good way to do it. Full stop. (analogy: Renoire artistically judging a subject by how my five-year-old renders it) (I *assume* that's what he meant - if he actually meant what he *said*, which is that `
      ..
      ..
      ` is ugly, then I really despair. 'cos, like, who could care - only people like me actually *read* HTML). Besides, we've no requirement *at all* to accept that presentation - the descriptive list is the *internal* construct, how that gets turned into (for instance) HTML is in our control (well, the tool writer's control), and it's only after that the browser gets its hands on it. Even before style sheets this was a valid way around the problem (one could, for instance, use tables, or bullet lists with the description formatted as the first paragraph - you get the idea). And with style sheets the document creator gets a *lot* of latitude, even if a standard construct like `
      ` *is* used. As I believe I've said elsewhere, I think Guido must have been having a bad week - it doesn't sound like the BDFL I've learnt to trust to miss the abstraction and focus on the (particular) implementation. Dammit, even in his "style sheet" (why won't he finish that?) he uses a descriptive list! (if it looks like a fish and walks like a fish, it can ride a bicycle like a fish, or something like that.) > > > At least for the time being a bullet list will be enough. > > > > Agreed. No. And I will keep fighting this, as I'm sure will other people (other people, anyone, please). After all, that's why we have the SIG! Guido is allowed to be human. He is allowed to be wrong. He is allowed to be *misinformed*. And he is definitely allowed to be convinced of a different opinion. He just gets the final overriding vote (on a PEP - which we haven't produced yet), and it is an item of faith that he only uses that "in extremis". > Later in > http://mail.python.org/pipermail/doc-sig/2001-March/001595.html > he wrote as a reply to mailto:edloper%40gradient.cis.upenn.edu: > > > Well.. I'm not sure whether we'd want to do that or not.. We > > > may be happy with just using '1.' and assuming that no one will > > > start a line with a number that ends a sentence.. > > > > That was ST's original sin. Again, sigh. This is the same (beloved, of course) person who was proposing we look at MoinMoin for ideas, which (for good reasons in context) uses a *horrible* hodge podge of markup mechanisms. And he castigates the ST family for this. (and it's by far not the *worst* "sin" in ST's books, surely) Whatever markup scheme we adopt, I can guarantee you it will have infelicities - especially if it "reads" like more-or-less natural text. People will have to know about those infelicities. As I said, a bad week. > If INDENT and DETENT tokens are part of a > upcoming EBNF docstring grammar, I think it might be > possible to come up with rules for ordered and descriptive > lists later on, which will not suffer from ST patterns > which trigger in error. I'm not convinced, myself, because we probably *can't* mandate exactly how people lay out paragraphs. For instance, consider the variation in:: Some text. This is more of the same. Some text. This is more of the same. Some text. This is more of the same. and (addressing lists themselves) in: Some text. 1. This is a list and I continue here. Some text. 1. This is a list and I continue here. Some text. 1. This is a list and I continue here. Some text. <> I don't see how we can stop people doing any of those (I bet if our format *tries* it will be either ignored or not used "properly"). That's one of the reasons I advocate ignoring indentation within paragraphs. The *only* way round that would be to require blank lines in front of list items, and that's a no-no for other reasons (well, we discussed that last time round the Doc-SIG loop). Besides, if people *really* find this a problem, we will just need to make sure that the tool implementing the spec looks out for possible problem cases, and that it can *warn* the document writer they may have a problem. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From fdrake@cj42289-a.reston1.va.home.com Thu Mar 29 13:01:26 2001 From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake) Date: Thu, 29 Mar 2001 08:01:26 -0500 (EST) Subject: [Doc-SIG] [development doc updates] Message-ID: <20010329130126.C3EED2888E@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ For Peter Funk: Removed space between function/method/class names and their parameter lists for easier cut & paste. This is a *tentative* change; feedback is appreciated at python-docs@python.org. Also added some new information on integrating with the cycle detector and some additional C APIs introduced in Python 2.1 (PyObject_IsInstance(), PyObject_IsSubclass()). From fdrake@acm.org Thu Mar 29 13:01:32 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 29 Mar 2001 08:01:32 -0500 (EST) Subject: [Doc-SIG] f(...) vs. f (...) inconsistency In-Reply-To: References: Message-ID: <15043.12844.679546.355978@cj42289-a.reston1.va.home.com> Peter Funk writes: > this is a very tiny issue, but it bugged over the years: > > In his Styleguide Guido wrote: > """I **hate** whitespace in the following places: [...] > Immediately before the open parenthesis that starts the argument > list of a function call, as in spam (1). Always write this as spam(1).""" > I agree to 100% with this. > On the other hand I very often cut'n'paste between the library reference manual > pages and an open editor window. Unfortunately in the library reference > there are spaces between function names and the opening parenthesis, which > I always have to remove manually. > > How come? Should the library documentation be fixed in this regard? I think that's a historical artefact, but it may make sense to keep it to ease readability. I'm publishing a version of the development documentation that makes this change, and am requesting feedback to python-docs. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From klm@digicool.com Thu Mar 29 16:10:24 2001 From: klm@digicool.com (Ken Manheimer) Date: Thu, 29 Mar 2001 11:10:24 -0500 (EST) Subject: [Doc-SIG] using the same delimiter on the left and right.. Message-ID: Tibs wrote: > Guido van Rossum wrote (in response to Edward Loper): > > > Well.. I'm not sure whether we'd want to do that or not.. We > > > may be happy with just using '1.' and assuming that no one will > > > start a line with a number that ends a sentence.. > > > > That was ST's the original sin. > > Is it a sin? I don't believe that you will get a markup system > (*whatever* its conventions) that doesn't have *some* nooks and crannies > where the user may not type. And if we're worried about (important, yes) This may have to do with an agregious wart in STclassic - the implementation is ridiculously too loose about what it accepts for the ordered list cue. It does *not* constrain to digits, nor to single characters! Here are two examples that are translated into ordered list elements:: Huh. This isn't *supposed* to be an ordered list element! Mr. Ken Manheimer would proudly like to present another vapid example. I can understand how unpleasant little bugs like this would inspire loathing and fear in the hearts of STclassic users. You actually have to change your sentence structure to work around it! (Perhaps i should have spent more time fixing STclassic bugs when i was working on WikiForNow (some revisions of ZWiki, which uses STclassic). However, time was extremely limited, and STNG is in the wings, so effort expended on STclassic seemed unworthwhile. *When* STNG is coming out of the wings is another story, though - with all the important things pending for Zope and for zope.org, it hasn't been high enough priority - the devil's bargain, etc. Sigh.) Still, some may just not consider punctuation-style cues for markup to be acceptable. That would be a shame - i think for situations like docstrings and brief, day-to-day content, such limited-scope, dirt-simple markup is the right way to go, if implemented well... Ken klm@digicool.com From guido@digicool.com Thu Mar 29 16:28:03 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 11:28:03 -0500 Subject: [Doc-SIG] What's important in a docstring markup language? In-Reply-To: Your message of "Thu, 29 Mar 2001 10:08:48 +0100." <001e01c0b82f$dd7af350$f05aa8c0@lslp7o.int.lsl.co.uk> References: <001e01c0b82f$dd7af350$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291628.LAA18924@cj20424-a.reston1.va.home.com> > > 9. URLs > > Quite important if this is meant to be "joined up" documentation (to use > a horrible buzz phrase our politicians seem addicted to). It is very > important to be able to reference documentation elsewhere, and having > them "clickable" in derived formats that support that ability is > important too. IMO, URLs don't need any special markup. They can just be recognized in the text and automatically highlighted. Lots of tools processing plain text do this (including the FAQ wizard, which has a trick or two to make this work reliably even when there's punctuation following the URL). --Guido van Rossum (home page: http://www.python.org/~guido/) From dgoodger@atsautomation.com Thu Mar 29 16:46:24 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Thu, 29 Mar 2001 11:46:24 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. Message-ID: Tony J Ibbs (Tibs) wrote: > Edward D. Loper wrote: > > We could switch our "literal" delimiter to "`". ... > > I hadn't thought of using backtick as a literal delimiter. Just a little reminder here, guys. Although I haven't had time (or energy) to participate in the recent voluminous discussions (I've been lurking), I did present some carefully thought-out arguments about the above topic (and many others) back in November. Have you read them yet? (Based on several recent posts, including the ones referenced above, it seems you haven't. Nudge, nudge. :) See: - A Plan for Structured Text http://mail.python.org/pipermail/doc-sig/2000-November/001239.html - Problems With StructuredText http://mail.python.org/pipermail/doc-sig/2000-November/001240.html - reStructuredText: Revised Structured Text Specification http://mail.python.org/pipermail/doc-sig/2000-November/001241.html Specifically, backticks ("`") are good for inline literals. Single quotes ("'") are bad, because we use 'em too much in all contexts (apostrophes, Python strings, prose quotations and nested quotations [Bruce said, "'Hot enough to boil a monkey's bum,' Her Majesty said, and smiled quietly to herself."], and the British seem to prefer them over double-quotes [in novels at least]). Hash marks ("#") are unbearably ugly. Eddy W.'s statements notwithstanding, in an agile (new P.C. term for "lightweight"; see http://www.agilealliance.org) markup scheme, we don't need all four of (inline, block) x (alien text, Python code); I think that's up to the tool to deal with. /DG From guido@digicool.com Thu Mar 29 16:48:36 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 11:48:36 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Thu, 29 Mar 2001 10:16:52 +0100." <002401c0b830$fdd85c90$f05aa8c0@lslp7o.int.lsl.co.uk> References: <002401c0b830$fdd85c90$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291648.LAA18948@cj20424-a.reston1.va.home.com> > > Yuck. Most of these (except '::') are quite commonly used for other > > purposes, and occur frequently in examples. > > And the problem with that is? That they will frequently need to be escaped in order to prevent special interpretation. Note that an escape character was absent from the list -- that's a big mistake, I think! > > I prefer markup languages with very few special characters, > > e.g. a GNU doc standard whose name I don't recall > > texinfo > > > which only uses @; or Perl's POD, which seems to get > > away with making only a letter followed by '<' special. > > Yes, but they are not applicable by the "heavyweight markup won't fly" > rule (which has been a principle of the Doc-SIG since 1997, and which > you yourself defended, and which I used to oppose until it was explained > Very Gently and Lots of Times to me why it was important). Well, after using ST, I'm not so sure I agree with that rule any more. I think HTML is too heavy, but I think reserving a dozen or so characters for special purposes is also wrong. > Texinfo (and there are other more modern examples) is still "formal > markup to produce a document", where the markup has equal status with > the text, and is expected to intrude. People will not want to write it > in docstrings. So we'd lose. But isn't this exactly what Javadoc does? > Pod is used successfully in the Perl world, and is a clear winner there. > I find it intensely unreadable, as a lightweight format. I haven't seen too much POD, so you may be right there. Is it worse than Latex? > One of the precepts of the whole Doc-SIG/docstring thing has been that > "marked up" text must be readable *as text*. I'll say again what I seem > to keep saying recently - that means that email is a sensible sort of > model. If we can successfully parse something close to what people type > in email, then we're onto a winner, in terms of getting people to use > it. Watch out though. As soon as you're getting into heuristics too much, our ways part. I want very clear, exact and predictable rules. > > Latex has at least three special characters ('\', '{', '}'), > > and in some contexts more, and that's already a pain. > > XML with '<' and '&' is borderline for me. > > We already have existing dictarotial fiat (first in 1997, reiterated by > you again recently) against LaTeX and SGML/HTML/XML. That's a Good > Thing, since the Doc-SIG as a whole has (each time round the loop) > agreed that all of these are non-flyers for docstring markup. Their > individual deficiencies (if so they be - that's a matter for argument > elsewhere) are thus not relevant. Sure. Though I've got a feeling that I'm disagreeing with "the doc-sig as a whole" a lot. Maybe I should just withdraw (again) from this whole discussion and let you all decide what you like, as long as it doesn't have to be used for the standard library? > > > Then the only context-dependant characters that remain would > > > be start-list-item characters.. And if we wanted to, we could > > > use '* ' at the beginning of any list item, since it's > > > reserved anyway... something like: > > > > > > * this is an unordered list item > > > *1. this is an ordered list item > > > > This is OK, although I like the single hyphen form better. > > There was a proposal last time round the loop to start all list item > "sequences" with a special character (debate obviously ensued on which). > It was dropped as a proposal (I can't remember which side of the debate > I started out on - Doc-SIG has had a history of changing my mind towards > the consensus by reasoned debate - don't you just hate it when that > happens?). > > On the whole, I oppose it now. It makes it easier for a parser, and much > harder for a human being, to write text. I think whitespace (a blank line and/or indentation) should be enough to recognize the start of a list. > > > Well.. I'm not sure whether we'd want to do that or not.. We > > > may be happy with just using '1.' and assuming that no one will > > > start a line with a number that ends a sentence.. > > > > That was ST's the original sin. > > Is it a sin? I don't believe that you will get a markup system > (*whatever* its conventions) that doesn't have *some* nooks and crannies > where the user may not type. And if we're worried about (important, yes) > fringe cases like that, why not make the implementation (note, not the > spec) able to give a warning if it looks like the user might have done > that (after all, ending a sentence, in *most* cases, can be spotted due > to puntuation, so it should, often, be feasible). Well, it would be allright if it only recognized numbers after a blank line. It's a pain if it latches on any "^\d+\." in the middle of a text block, because (in my experience) that's never a numbered item, it always just happens to be a sentence ending in a number. > > I can't endorse this yet. > > I am worried that you, Guido, are coming into a debate which you have > not participated in (note - *that* is not a criticism - there are other > important things I'd like you to have been spending your time on) and > putting down some ground rules which *appear* to contradict > group-wisdom, as derived over the years. I'm a bit uncomfortable with > having to attempt to "channel" the results of that, given I tend to be > opinionated anyway, but even so. Well, you (as a group) asked me my opinion, which I gave. If you don't like it, fine, I'll bail out again, I *do* have other things to do. Also note that I repeatedly requested to see the spec you (again as a group) had arrived at, and nobody has pointed me to it. Given that the doc-sig has ben going around in circles since 1997, I worry that it's never going to reach a conclusion -- with or without my involvement. > The Doc-SIG has had a disturbing habit of getting *very close* to a > product, and then just petering out. This seems to partially correlate > to the aftermath of a Spam meeting (frustrating if one couldn't be > there), although for entirely different reasons each time, I believe > (i.e., that's hopefully a red herring). Lots of things get a jolt of energy at a Python conference (can we stop calling them spams?) and then peter out. The types-sig has seen this phenomenon too. I guess it's because real life takes over after a while. > I'd be very interested to know what you consider your "sticking points" > on this to be - it may be that they are nothing we would worry about, it > may be that they are issues we've already argued around in the past. Show me your spec and I'll review it. You can't expect me to lay out ground rules without knowing where your thinking is going. > For one thing, I'd appreciate *someone* explaining to me, slowly and > with illustrations, just what is wrong with having context-sensitive > markup *in docstrings* (not in abstract large documents marked up for > typesetting (a la TeX), not in data specifications marked up for > detailed content retrieval (a la SGML), but in docstrings marked up for > humans to read the markup as text, and for software to retrieve some > extra information for slightly improved presentation and for slightly > improved information extraction). I believe the problem is with the required preciseness of docstrings. Docstrings are not like email, where the reader can usually guess what you meant despite typos and transmission glitches. Imagine a docstring describing a regular expression-like language. Can you see the damage that could be done by inadvertently changing all double backslashes into single backslashes, or interpreting *...* as bold (hence dropping the *s)? There are lots of situations like this. (E.g. I recently noticed that Ping made some docstring a raw string because it contained examples involving \r and \n.) Every character counts, and so does every bit of whitespace -- at least sometimes, and the docstring processor can't be smart enough to always know when. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 17:05:04 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 12:05:04 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Thu, 29 Mar 2001 10:16:48 +0100." <002201c0b830$fb538670$f05aa8c0@lslp7o.int.lsl.co.uk> References: <002201c0b830$fb538670$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291705.MAA19008@cj20424-a.reston1.va.home.com> > The Runoff family was a simpler variant on the TeX idea, which wanted to > produce computer manuals, and so on. There's generally less control over > meaning, more interest in presentation. It's not clear to me if troff > and so on belong to the TeX family or the Runoff family. TeX and troff are both very old and have the same amount of control over the typesetter. However, the TeX *language* is more flexible (maybe too flexible) and hence could more easily beget LaTeX, which is a member of the SGML family: it cares about meaning, not markup. There are languages like that in the troff family too: the -man macros (generally try to) specify meaning, not markup. Troff lost to TeX because it was too concerned with machine efficiency in the 16-bit days, and restricted its language to 2-character identifiers. > The Pod family is, maybe, if it exists, the family of marking up > docstrings. Edward Welbourne has talked about this in an earlier email. > Basically, the aim is to produce something more useful than plain text > (but not of a quality to stop a technical documentor wincing), leaving > the original, marked up, text still useful *as such*. Eddy also comments > that if someone using (in his comment, ST) is spending too much time > worrying about markup, then they're not spending enough time working on > more important things. Objection. I'm not "worrying too much about markup". I'm worrying quite a lot, based upon hard-earned experience, that the heuristics of the ST family introduce unwanted markup and drop characters that are essential for the documentation. > > Let's make all of our delimiters into real delimiters, > > that can only be used for delimiting (or maybe also for > > bullets, in the case of '*'). We could switch our "literal" > > delimiter to "`". So then we would have the following > > reserved characters, that may not appear in text without > > being quoted somehow: > > '<' left delimiter for URLs > > '>' right delmiter for URLs Unneeded -- URLs can be recognized easily, and we don't have to drop any characters when we add a link. > > '#' delimiter for inlines > > '`' delimiter for literals > > '*' delimiter for emph, maybe for strong. We only need one of these (emph or strong). > > '::' marker for literal regions We need an escape too. > I hadn't thought of using backtick as a literal delimiter. In the > context of docstrings, I can't see why it wouldn't work - hmm, this is a > `literal` - yep, that works for me (does the resonance with Python > backtick work?). >From a readability perspective, I'd prefer `symmetric quotes'. > It frees up both sorts of "normal" quote, which is > good, and only inconveniences people like Eddy who insist on typing > `both sorts of single quote' (TeX users, the lot of them). And it mean I > can type "'cos" without worrying (or 'plane or 'phone if I want to > appear old-fashioned). Maybe it would work if the quotes were left in the output text? That way at least if a stray backtick is mistaken for markup, it's still clear that there was a backtick in the docstring source. > And if those *are* the delimiters, then it *would* work to expect them > to be quoted when they occurred - neat. Just goes to prove why we keep > Edward around on the list (please add a here). > > Does that mean we allow things like "there is a hard-space` `here"? It > would be quite a neat thing to allow... This looks like horrible abuse to me. --Guido van Rossum (home page: http://www.python.org/~guido/) From Juergen Hermann" Message-ID: On Thu, 29 Mar 2001 11:28:03 -0500, Guido van Rossum wrote: >IMO, URLs don't need any special markup. They can just be recognized >in the text and automatically highlighted. Lots of tools processing >plain text do this (including the FAQ wizard, which has a trick or two >to make this work reliably even when there's punctuation following the >URL). +1 (actually, you kicked me in the right direction to improve MoinMoin's code in that respect ;). I think we should go the plain text route, with _conservative_ regexes (i.e. a sane implementation) and not too fancy markup (Tony's list). The main thing to consider in a first implementation is that we do not paint ourselves into a corner (like using too much markup characters that'll make it hard to keep the "plain readable text" idea). If people want STNG in docstrings, plug in a parser for it. On the problem of deciding what parser to use, I propose to add some hint on a = per module basis (mixing several docstring styles per module would be a = silly, unsuported idea). Either by a magic variable in the module, or a = magic comment, or some hint in the module's docstring. Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ From guido@digicool.com Thu Mar 29 17:08:56 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 12:08:56 -0500 Subject: [Doc-SIG] going awry In-Reply-To: Your message of "Thu, 29 Mar 2001 10:16:55 +0100." <002501c0b830$ff8ee630$f05aa8c0@lslp7o.int.lsl.co.uk> References: <002501c0b830$ff8ee630$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291708.MAA19032@cj20424-a.reston1.va.home.com> > It doesn't help that I don't understand (myself) why people object to > context sensitivity in markup in something like ST - what on earth is > wrong with a single quote in the middle of a word being different than a > single quote at the start of a word, or just before punctuation? We're > all good at reading text - that means we don't even *see* such > constructs AS SUCH - they're part of the "scanning interface" we run > over lines. I mean, a person presented with 'this isn't difficult' > doesn't have a particular problem with discerning that the middle quote > is different than the others, and whilst I wouldn't propose *allowing* > that as a quoted string in our format, it's nastier than the fringe > cases people *are* worrying about. Have you tried to use ST to document a language that happens to place a special meaning on most of the ST special characters? (Like ST itself. :-) It's horrid unless the rules are very clear and simple, and there's a really easy way to turn ST's heuristics off -- and not just in literal blocks (which are only half the solution). > > I think the implementation-specific problems can be fixed by > > the efforts we were seeing. > > There are some specific things about ST that *would* be nice to fix, and > being free to do that (by dictatorial fiat) is a Good Thing. But I think > throwing out the whole thing is not - it's been 5 years, dammit. You know, that *could* mean that the problem is simply intractable, and that we'd all do better by admitting that the only two real options are real plain text or real markup... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 17:20:06 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 12:20:06 -0500 Subject: [Doc-SIG] Formalizing ST In-Reply-To: Your message of "Thu, 29 Mar 2001 10:16:50 +0100." <002301c0b830$fcafe220$f05aa8c0@lslp7o.int.lsl.co.uk> References: <002301c0b830$fcafe220$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291720.MAA19114@cj20424-a.reston1.va.home.com> > Indentation for structure is contentious with many people, and whilst it > *sounds* like a good idea (especially to Python people) many object to > ending up with the bulk of their text indented. What worked against this specific ST feature is that in ZWikis, you end up editing sizeable documents in a text box in Netscape, which has no support for auto-indentation. > > However the url detection without requiring '<' and '>' > > delimters around the http:// ... string is a nice feature > > of MoinMoin markup. > > You haven't been following the me and Edward Loper (and Edward > Welbourne) flurry of emails over recent weeks, have you? > > The trouble with finding *bare* URIs in a text document written by > humans with punctuation is that, in the general case, you can't do it. > For instance, a URI is allowed to end with a dot ('.'). So how do you > cope with a sentence that ends with http://www.tibsnjoan.co.uk/. Is that > last do part of the URI or not? There are other issues about what can go > inside the URI, as well. Yes, people can come up with ad-hoc solutions > (docutils/stpy.py works reasonably well), but they are ad-hoc and not > guaranteed to work. This disturbs some people (I'm not *too* fussed, but > then I'd err on the side of detecting *too many* URIs, I think, which I > know would upset some people). The FAQ wizard uses a simple and sufficient rule, which almost never misfires: it scans up to whitespace, and then trims punctuation characters from the back. While URLs certainly *can* end in punctuation, I have never seen URLs that *did*. Invariably, a trailing period or comma is part of the sentence, not part of the URL. > The *only* safe way (and note that this is an option in MoinMoin also) > is to delimit the URIs with some mechanism, and '<..>' is at least a > fairly traditional solution. Which unfortunately means you would have to escape each < or > that was not meant to be a URL delimiter. These occur frequently in Python code samples (``if i < 10: print i'') but also, and I would say more frequently, in any documentation that describes XML or HTML samples. I find the ability to write "
      and
      are equivalent in HTML but not in XHTML" more important than the ability to mark URLs unambiguously, given the success rate of existing heuristics there. > > Ping has implemented something similar in pydoc already > > and this works just fine. > > See above - it's "modulo just fine" I'm afraid (Ping is happy with > approximate solutions that find too many instances - somewhat more than > myself - so *of course* pydoc does what it does (and of course it > should)). That's a new meaning of "modulo". :-) > > I have a similar feeling with the email address recognition > > Erm - email addresses should be presented as URIs, honest. Yeah, right. Tough luck getting people to add mailto: to their address. Be practical, and add a hyperlink to anything that looks like an email address -- if you don't eat any characters that were present in the source, soemwhat overzealous recognition won't hurt. > > About lists and numbered lists I'm still not sure what I would like. > > I bullet item list (LaTeX itemize) seems to be enough for most cases. > > No, that is not sufficient. There are too many of us who *want* (no, > *need*) more sorts of list (believe me, I've been using a too-simple > internal markup tool for C function header comments for years, and it > has only one type of list, delimited by '@' - it's not sufficient - > people end up writing lists out "by hand", which rather circumvents the > point). What exactly is lacking in that tool? Nested lists? We can do those. Numbered lists? We don't need autonumbered lists, so we can require that the numbers are already in the source. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 17:26:00 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 12:26:00 -0500 Subject: [Doc-SIG] Formalizing ST In-Reply-To: Your message of "Thu, 29 Mar 2001 12:08:37 +0100." <002c01c0b840$9a88acc0$f05aa8c0@lslp7o.int.lsl.co.uk> References: <002c01c0b840$9a88acc0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291726.MAA19161@cj20424-a.reston1.va.home.com> > Obvious URIs that fail the test are "." and ".." (both perfectly legal > "local" references within an HTML document, and certainly possible > things for someone to want to use in a docutils context, I'd have > thought - particularly in a package's __init__.py docstring). You can always append a "/" to URLs ending in "." or "..". In fact that's recommended practice anyway -- otherwise you incur an extra server roundtrip since most servers give you a 301 or 302 redirect with an appended slash if you give a directory URL without trailing slash -- this is to make relative URLs work. > > IMO it wouldn't hurt, if detection fails in this case. > > The problem isn't with detection *failing*, it's partly to do with > excessive detection (i.e., the pragmatic schemes generally try to > over-identify URIs, just in case), but *mainly* due to a worry about > explaining to a user what they can type that will work, before they type > it. > > An explanation that goes: > > "type your URI, but if it ends in one of > these characters, you'll have to escape > it, or something, and by the way *this* > ad-hoc list of characters inside your > URI also needs escaping" Practical URLs don't end in punctuation. Show me a website whose URLs do and I'll change my mind, but I bet you can't find one. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 17:42:07 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 12:42:07 -0500 Subject: On ordered Lists (was RE: [Doc-SIG] Formalizing ST) In-Reply-To: Your message of "Thu, 29 Mar 2001 13:23:35 +0100." <003101c0b84b$137b7590$f05aa8c0@lslp7o.int.lsl.co.uk> References: <003101c0b84b$137b7590$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291742.MAA19253@cj20424-a.reston1.va.home.com> > > Guido replied on an email from me: > > > > I think, a description list can be dropped alltogether. > > > > > > Yes! They are darn ugly in HTML anyway. > > Sigh. Judging a construct by how IE and Netscape present it is not a > very good way to do it. Full stop. (analogy: Renoire artistically > judging a subject by how my five-year-old renders it) Given that IE and Netscape are what 99% of our users use to view documentation, I don't see why this argument is rule out. Remember, practicality beats purity. You seem to argue for the purity side of things. And I don't believe the Renoir (sp!) reference is relevant at all. IE and NS are not your five-year-old. They are the front page of your city's newspaper. > Besides, we've no requirement *at all* to accept that presentation - the > descriptive list is the *internal* construct, how that gets turned into > (for instance) HTML is in our control (well, the tool writer's control), > and it's only after that the browser gets its hands on it. Even before > style sheets this was a valid way around the problem (one could, for > instance, use tables, or bullet lists with the description formatted as > the first paragraph - you get the idea). And with style sheets the > document creator gets a *lot* of latitude, even if a standard construct > like `
      ` *is* used. Except that the problem is that typically there's *no* decent-looking way to present such lists. > As I believe I've said elsewhere, I think Guido must have been having a > bad week - it doesn't sound like the BDFL I've learnt to trust to miss > the abstraction and focus on the (particular) implementation. I can do without this particular abstraction. > Dammit, even in his "style sheet" (why won't he finish that?) Lack of time, like so many other things. > he uses a > descriptive list! (if it looks like a fish and walks like a fish, it can > ride a bicycle like a fish, or something like that.) Note that I don't need semantic mark-up for my descriptive lists -- the English language, punctuation and bulleted lists do all that I want. An argument against too much abstraction in the current discussion: the core idea of ST is that the source looks sufficiently like the output to be readable without any processing. I'd rather not have a tool that tries to extract an abstraction and lays it out completely different in another medium, because that means what I think looks right in plain text will suddenly look wrong on the user's screen. Giving the renderer too much freedom IMO makes it harder for the author to do the right thing. Really, I wish we could use WYSWIG for docstrings -- that would be so much better! But program text editors don't allow that yet... :-( > No. And I will keep fighting this, as I'm sure will other people (other > people, anyone, please). After all, that's why we have the SIG! So please write down exactly which forms of lists you want. > Guido is allowed to be human. He is allowed to be wrong. He is allowed > to be *misinformed*. And he is definitely allowed to be convinced of a > different opinion. I am so glad you aren't telling me what to think or do. > He just gets the final overriding vote (on a PEP - which we haven't > produced yet), and it is an item of faith that he only uses that "in > extremis". Well, actually, if I vote your PEP down, that doesn't have to stop you from using it anyway in your own code. And if you can convince enough other users to follow your conventions, I may be convinced. This is different than a language change, where I really *do* have the last word! > Whatever markup scheme we adopt, I can guarantee you it will have > infelicities - especially if it "reads" like more-or-less natural text. > People will have to know about those infelicities. Except if it *is* plain text. > As I said, a bad week. For you, or for me? :-) > and (addressing lists themselves) in: > > Some text. > 1. This is a list > and I continue here. > > Some text. > 1. This is a list > and I continue here. > > Some text. > 1. This is a list > and I continue here. > > Some text. > < but indented a little bit>> Or my preference: Some text. 1. This is a list item continued here. 2. This is the second item. > I don't see how we can stop people doing any of those (I bet if our > format *tries* it will be either ignored or not used "properly"). That's > one of the reasons I advocate ignoring indentation within paragraphs. > The *only* way round that would be to require blank lines in front of > list items, and that's a no-no for other reasons (well, we discussed > that last time round the Doc-SIG loop). Hm, I must've missed that. It seems reasonably enough to me (I wrote the above before reading on). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 17:53:59 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 12:53:59 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Thu, 29 Mar 2001 11:10:24 EST." References: Message-ID: <200103291753.MAA19334@cj20424-a.reston1.va.home.com> > Still, some may just not consider punctuation-style cues for markup to be > acceptable. That would be a shame - i think for situations like > docstrings and brief, day-to-day content, such limited-scope, dirt-simple > markup is the right way to go, if implemented well... I'm concerned about the attitude that docstrings are stuff you shouldn't care too much about. Python docstrings are used to document constructs in a programming language. Precision is of the essence. We *need* to be able to control every single character of the output. I think the most important requirements are to be able to indicate what is free-flowing, normal text and what isn't (either inline example text or larger blocks of literal text). Everything else is secondary. --Guido van Rossum (home page: http://www.python.org/~guido/) From klm@digicool.com Thu Mar 29 17:59:52 2001 From: klm@digicool.com (Ken Manheimer) Date: Thu, 29 Mar 2001 12:59:52 -0500 (EST) Subject: [Doc-SIG] going awry In-Reply-To: <200103291708.MAA19032@cj20424-a.reston1.va.home.com> Message-ID: On Thu, 29 Mar 2001, Guido van Rossum wrote: > Have you tried to use ST to document a language that happens to place > a special meaning on most of the ST special characters? (Like ST > itself. :-) It's horrid unless the rules are very clear and simple, > and there's a really easy way to turn ST's heuristics off -- and not > just in literal blocks (which are only half the solution). I think it makes sense to have an easy way (a really easy way:) to turn ST interpretation off for arbitrary extents - something like shell hereis, perhaps. It's also interesting to focus on using STwhatever to describe STwhatever. There's a bit of a scope question in the latter, though - would such a document be larger/more comprehensive than the kinds of things we're concerned with in docstrings? I don't know. I think with reasonable escapes it could be easy, though. (Re escapes - i'd like to see such things done keeping jim's original intent that the motivations for structured text gestures make sense in the context of the raw text as well as for their interpretation. Eg, a hereis style delimiter that looks like: ... [Unformatted passage follows, until "End of unformatted passage"] *text fragment* indicates emphasis formatting [End of unformatted passage] For want of any insight on a double-duty formalism for single-character escapes, i'd be inclined to go with '\' or character doubling...) > > There are some specific things about ST that *would* be nice to fix, and > > being free to do that (by dictatorial fiat) is a Good Thing. But I think > > throwing out the whole thing is not - it's been 5 years, dammit. > > You know, that *could* mean that the problem is simply intractable, > and that we'd all do better by admitting that the only two real > options are real plain text or real markup... Look at the history. The problems have come up in coming to agreement about a reasonably scoped effort for a lightweight language - and then in avoiding the temptation to invent a new markup language from scratch. (It's lightweight, it must be easy to formulate, right?-) Recentaly we *did* seem to actually be making progress! There was some kind of agreement about where to start, with a leg-up on a viable though crufty language, and some genuine progress towards rectifying the problems! (Thanks, thanks, thanks, edward and tony!!) I hope those efforts keep on track. (I'm not sure what documentation you have and haven't had identified - i don't have the URL for edward's STminus EBNF specification, or tibs' stpy site - i'm hoping someone will chime in with them, in case those are what you need...) Ken klm@digicool.com From dgoodger@atsautomation.com Thu Mar 29 18:07:15 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Thu, 29 Mar 2001 13:07:15 -0500 Subject: [Doc-SIG] f(...) vs. f (...) inconsistency Message-ID: > and am requesting feedback to python-docs. I think fn() looks fine, no readability problems. For copy-and-paste it's a win vis-a-vis Python's styleguide. David Goodger, Systems Administrator & Programmer Automation Tooling Systems Inc., Advanced Systems 730 Fountain Street, Building 3, Cambridge, Ontario, Canada N3H 4R7 direct: +1-519-653-4483 ext. 7121 fax: +1-519-650-6695 e-mail: dgoodger@atsautomation.com From guido@digicool.com Thu Mar 29 18:07:32 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 13:07:32 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Thu, 29 Mar 2001 11:46:24 EST." References: Message-ID: <200103291807.NAA19426@cj20424-a.reston1.va.home.com> David Goodger writes: > - A Plan for Structured Text > http://mail.python.org/pipermail/doc-sig/2000-November/001239.html > > - Problems With StructuredText > http://mail.python.org/pipermail/doc-sig/2000-November/001240.html > > - reStructuredText: Revised Structured Text Specification > http://mail.python.org/pipermail/doc-sig/2000-November/001241.html What I like most about this is that it is a full specification! The first one I've seen that's exact enough to be criticized and to be understood. I think you may be going overboard with features, but I like many of your ideas, both about heuristics for implicit markup (e.g. sections) and about the tokens you use for explicit markup (using ".."). I also like that you define the escaping mechanism upfront. (Using \ to escape means that we're going to have to make our docstrings raw strings. Big deal. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 18:22:28 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 13:22:28 -0500 Subject: [Doc-SIG] New document - pytext-fat In-Reply-To: Your message of "Wed, 28 Mar 2001 12:01:47 +0100." <001101c0b776$7ba21e60$f05aa8c0@lslp7o.int.lsl.co.uk> References: <001101c0b776$7ba21e60$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291822.NAA19637@cj20424-a.reston1.va.home.com> I just noticed in yesterday's mail: > I'm reading this now. I take back my complaints that nobody sent a spec my way -- at least until I'm done reading. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From pf@artcom-gmbh.de Thu Mar 29 18:26:32 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Thu, 29 Mar 2001 20:26:32 +0200 (MEST) Subject: On ordered Lists (was RE: [Doc-SIG] Formalizing ST) In-Reply-To: <200103291742.MAA19253@cj20424-a.reston1.va.home.com> from Guido van Rossum at "Mar 29, 2001 12:42: 7 pm" Message-ID: Hi, Guido van Rossum: [...about writing ordered lists in plain text style...] > Or my preference: > > Some text. > > 1. This is a list item > continued here. > > 2. This is the second item. What do people (and Guido?) think of the following style for ordered lists? I've seen this used in EMail and News quite often: Some text. (1) This is a ordered list item continued here. (2) This is the second item. (3) Items may consist of several paragraphs. As long as the paragraphs keep proper indentation. This is text following the list. Of course this would require, that indentation plays an important role in an upcoming text structure grammar. The parser should always fall back into literal (aka preformatted) paragraph mode on any material, which violates the rather strict grammar rules. Only text paragraphs which are properly (equally) indented plain text and contain no hyphens at the end of lines, should be allowed for reformatting in propotional fonts and with new line breaks. I disagree in this respect with Tony and may be others in this SIG. Allowing free form paragraphs like this one here for reformatting is simply to dangerous. Mandating the rule, that normal text paragraphs must be aligned on the left side seems to be a very reasonable restriction to me. Otherwise they should come out in fixed font and people will see what they are used to before the advent of clever Text structure recognition tools. The IMO disclaimer applies here as well. It was not my intention "to throw oars into Tonys and Edwards work. Best regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From dgoodger@atsautomation.com Thu Mar 29 18:42:03 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Thu, 29 Mar 2001 13:42:03 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. Message-ID: Thanks for the encouragement, Guido! Guido van Rossum writes: > I think you may be going overboard with features I agree, and I intend to pare it down to the bare essentials. For example, descriptive lists are problematic. In the past I *have* noticed your [and others'] use of ' -- ' for em-dashes. According to the Chicago Manual of Style, you're not supposed to use spaces on either side of em-dashes, but people do use this construct and we've gotta live with it. Trying to enforce rules on people for a supposedly 'transparent' markup system like ST is ass-backwards. The markup must abide by common usage, not the other way around. That's the strongest argument against using single-quotes for inline literals I know of. We can use `backticks` or `symmetric quotes' or *both*! (I see problems with symmetric, like: "string assignment: `s = 'this is a string''". Single-quotes are just too common in all contexts, IMHO.) > I also like that you define the escaping mechanism > upfront. (Using \ to escape means that we're going to have to make > our docstrings raw strings. Big deal. :-) Anti-escape-mechanism people claim that it's not needed. They say backslashes are hard to use because of overloading (ya gotta double 'em up sometimes). But if they're not needed, why complain about how difficult it is to use them? And the only people who will actually use them (in order to document REs or ST itself) ought to know about raw strings anyhow. /DG From dgoodger@atsautomation.com Thu Mar 29 19:15:08 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Thu, 29 Mar 2001 14:15:08 -0500 Subject: On ordered Lists (was RE: [Doc-SIG] Formalizing ST) Message-ID: People will write ordered/enumerated lists in all different ways. I think that it's folly to try to come up with the "perfect" format for an enumerated list, and limit the docstring parser's recognition to that one format only. Why limit ourselves to RE-processing here? We're writing software here! Using Python! Rather, recognize a variety of formats as potential enumerated lists, and decide based on the labels. If a "potential enumerated list item" (PELI) is labeled with a "1.", and is immediately followed by a PELI labeled with a "2.", we've got an enumerated list. Or "A" followed by "B". Or "i)" followed by "ii)", etc. We wouldn't have any problem with any of these, even without requiring blank lines: "That bird wouldn't *voom* if you put 10000 volts through it!" 1 is all I need. Mr. Creosote. Whatever gave you such an idea? A murderer? No, not I. I'd never hurt a fly! The chances of a PELI labeled with "2" after "1 is all I neeed", or "II." after the last example, are acceptably small. /DG From tim.one@home.com Thu Mar 29 19:40:53 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 29 Mar 2001 14:40:53 -0500 Subject: POD (was RE: [Doc-SIG] using the same delimiter on the left and right..) In-Reply-To: <200103291648.LAA18948@cj20424-a.reston1.va.home.com> Message-ID: [Tony J Ibbs (Tibs) > Pod is used successfully in the Perl world, and is a clear winner there. > I find it intensely unreadable, as a lightweight format. [Guido] > I haven't seen too much POD, so you may be right there. Is it worse > than Latex? Well, you can include LaTeX sections in POD, so formally I guess it can't be better . Here's the POD spec: http://www.cpan.org/doc/manual/html/pod/perlpod.html It's smaller than half the msgs in this debate <0.9 wink -- but it's really not enough of "a spec" to answer all practical questions>. Do note that perlpod.html was generated from a POD doc, though. Short course: the input is broken into paragraphs (via blank-line separation). A paragraph then falls into one of three categories: verbatim, command or ordinary text. Verbatim is like a Python raw-string: *nothing* about it is altered. A paragraph is verbatim iff its first line begins with whitespace. If there isn't a space or tab in the first line, it's a command paragraph or plain text. A command paragraph begins with "=" immediately followed by an identifier. There are two commands (=head1 and =head2) for headings; three for dealing with lists (=item, =over, =back); three for embedding docs in formats other than POD (like HTML or LaTeX, or verbatim text that doesn't happen to begin with whitespace; =for, =begin, =end); and a couple for telling the Perl compiler where POD sections begin and end (=pod, =cut). That's it for commands. Everything else is ordinary text. There are 8 inline markup gimmicks, of the form "X<" text goes here ">" where X is a single character, covering italics, bold, text with non-breaking spaces, literal code, cross-reference links, filenames, index entries, and Z<> for a zero-width character. Also entity-like "&" escapes. In practice, I rarely see escapes other than C, and it's *nice* to have a wholly unambiguous way to include code snippets inline. The list gimmicks (=over, =item, =back) are visually jarring the first times you see them; in return, you never get a list by mistake; OTOH, if you want numbered lists, you supply the numbers yourself; on the fourth hand, if you want unusual list item bullets or numbering, you just type what you want. It's easy to use, and is mostly idiot-proof. OTOH, it's not *obvious* at first glance, particularly not the list stuff. But it's a matter of no more than two minutes to *learn* the list conventions, and then they're easy too. pod-is-a-lot-more-pythonic-then-perl-ly y'rs - tim From gward@mems-exchange.org Thu Mar 29 19:55:51 2001 From: gward@mems-exchange.org (Greg Ward) Date: Thu, 29 Mar 2001 14:55:51 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: <200103291648.LAA18948@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Thu, Mar 29, 2001 at 11:48:36AM -0500 References: <002401c0b830$fdd85c90$f05aa8c0@lslp7o.int.lsl.co.uk> <200103291648.LAA18948@cj20424-a.reston1.va.home.com> Message-ID: <20010329145551.A13751@mems-exchange.org> On 29 March 2001, Guido van Rossum said: > > Texinfo (and there are other more modern examples) is still "formal > > markup to produce a document", where the markup has equal status with > > the text, and is expected to intrude. People will not want to write it > > in docstrings. So we'd lose. > > But isn't this exactly what Javadoc does? I dunno about everyone else, but my objection to Javadoc is that it's not really a markup language -- it just uses HTML and throws in the @returns/@throws/etc thingy because those are useful things when documenting Java code. (And would be in Python code, too.) IOW, Javadoc is easy to turn into HTML, but (I expect) difficult to turn into anything else, unless you restrict the set of tags allowed. It sounds like there's no One True Javadoc parser, which is probably a PITA. > > Pod is used successfully in the Perl world, and is a clear winner there. > > I find it intensely unreadable, as a lightweight format. > > I haven't seen too much POD, so you may be right there. Is it worse > than Latex? Dunno who you were quoting there, but I strongly disagree with "intensely unreadable". Judge for yourself; here's a snippet of POD documentation for a C library I wrote: """ =head1 NAME bt_input - input/parsing functions in B library =head1 SYNOPSIS [...] =head1 DESCRIPTION The functions described here are used to read and parse BibTeX data, converting it from raw text to abstract-syntax trees (ASTs). =over 4 =item bt_set_stringopts () void bt_set_stringopts (bt_metatype_t metatype, ushort options); Set the string-processing options for a particular entry metatype. This affects the entry post-processing done by C, C, and C. If C is never called, the four metatypes default to the following sets of string options: BTE_REGULAR BTO_CONVERT | BTO_EXPAND | BTO_PASTE | BTO_COLLAPSE BTE_COMMENT 0 BTE_PREAMBLE 0 BTE_MACRODEF BTO_CONVERT | BTO_EXPAND | BTO_PASTE For example, bt_set_stringopts (BTE_COMMENT, BTO_COLLAPSE); will cause the library to collapse whitespace in the value from all comment entries; the AST returned by one of the C functions will reflect this change. """ "man perlpod" for the rules. The main things to know: * indentation means verbatim * C<> is code, B<> is bold, I<> is italics If this keeps up, I'll write a proposal for a POD dialect for documenting Python. The "=foo" headers would disappear for sure -- they're ugly, and that syntax is part of both Perl's parser and every POD parser. Yech. Just for fun, here's some more POD, this time from a Perl module I wrote: """ =head1 DESCRIPTION F provides a handful of otherwise unclassifiable utility routines. Don't go looking for a common thread of purpose or operation---there isn't one! =over 4 =item timestamp ([TIME]) Formats TIME in a complete, unambiguous, ready-to-sort fashion: C. TIME defaults to the current time; if it is supplied, it should be a time in the standard C/Unix representation: seconds since 1970-01-01 00:00:00 UTC, as returned by Perl's built-in C