at the start of paragraphs - HTML really does mandate it, despite the fact that IE seems not to care (I've come across browsers in the past that *did* treat the absence of

as meaning pure whitespace, causing all "paragraphs" to run together...). Personally, I recommend HTML Tidy as a tool for checking/reformatting HTML - not that I always do what it *says*, but at least I then know when I'm being naughty... Damn, now I'm going to have to learn EBNF. Hard when I've got my stupid-hat on - it took me quite a while to realise why "'" was called APOS... I do like the assumption of a dialogue between STNG and STpy - although at the moment you're it! (mind you, that's a damn good start, so far as I'm concerned). I'll try to go through the text and look for "intentional differences" that I know about. (oh - the link [5] at the bottom of the intro page, to http://www.cis.upenn.edu/~edloper/pydoc/ebnfla_proof.html gives a Not Found error) All the best, Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Thu Mar 8 20:12:58 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 08 Mar 2001 15:12:58 EST Subject: [Doc-SIG] Formalizing StructuredText (yeh!) In-Reply-To: Your message of "Thu, 08 Mar 2001 10:37:16 GMT." <003b01c0a7bb$be8712a0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103082012.f28KCwp15503@gradient.cis.upenn.edu> Tibs said: > Secondly, I'll add a link from the next "status" document for docutils - > do you prefer to be "Edward Loper" or "Edward D. Loper"? Thanks. Let's call me Edward Loper for now. I also have a page at: http://www.cis.upenn.edu/~edloper/pydoc that will contain pointers to all my essays/work on python documentation and StructuredText. > - I've added a reference to STminus at the end of the page. It might be > an idea to get a Zope Wiki account (if you don't already have one) and > add a reference somewhere further up the hierarchy as well - I don't > know if Jim Fulton and co. have the time or inclination to watch the > Doc-SIG... I'll try to get an account sometime. I've never used Zope before, but it doesn't look that difficult to find out. Is there someone I need to talk to to get an account, or do I just register on some web page somewhere? If Jim (and co.) don't read Doc-SIG, should I send email to someone at STNG telling them what I'm working on? If so, who? > Fourthly, I haven't had time to read the whole STminus document (heh, I > only just saw your email!), but I did note the large red box halfway > down the actual STminus definition page. Just above it you say:: > > Note that the empty literal ('') is a valid literal. > > I have a sneaky feeling that this is not so in the current version of > STpy (it *may* have been earlier - it's something I'm ambivalent about, > since I can't see much *use* for an empty literal string). I have a suscicion that STminus's current definition does *not* actually provide a subset of the intersection of STNG and STpy.. This should hopefully become more clear once I make lots of test cases, and can run them through the parsers for all three languages (although, just because one parses something one way, doesn't mean that that's the intended behavior, but..) > Oh - one last point - please, please, please put

at the start of > paragraphs - HTML really does mandate it, despite the fact that IE seems > not to care (I've come across browsers in the past that *did* treat the > absence of

Someone who wants to include a code fragment including a comment can perfectly easily put it into one of the block-style structures for enclosing non-plaintext, as I argued when proposing #...# for this role; I'm inclined to apply the same ruling to code fragments like:: script.write('echo HTTP/1.1 200 OK\n# no headers\necho') which describe python code which uses a # other than as a (python) comment character. If you decide you want to let me discuss, inline, a shorter cousin of this: I'll point at my uses of \n in it and ask whether you really want me to reactivate perverse counterexample mode. I realise it'll be mildly irritating (and add to vertical space use) to have to go into a block to say a code fragment any time it's got a # in it; or a verbatim text any time it's got a ' in it (indeed, the latter is the more serious issue); at least when it's such a short fragment one wants it to be inline. But inline fragments are a luxury. Adding an escape character requires us to make provision for escaping the escape (else, as Edward pointed out, we can't *end* a fragment with the escape character). At which point the ability of folk to work out how many backslashes they're looking at depends not only on counting the backslashes they can see, and on working out whether the string is r'...' or not, but also on whether they're inside an inline fragment right now. This *will* confuse pythoneers. Confusion is worse than mild irritation. Ergo, don't do it. Not even to save vertical space ;^| (and, as some earlier cross-talk between myself and Tibs might clue you to notice, I consider vertical space a big sacrifice). Oblige doc-strings which want to talk about a fragment, using the delimiter ST* uses for the relevant kind of inline fragment, to do the fragment as a block, not an inline. This is an easy rule with no fancy repercussions. It doesn't collide with the already sophisticated reading of \ in strings: and it's easy to describe and understand. Ergo it is the Right Thing To Do. Please ? Eddy. From Edward Welbourne Sat Mar 17 18:06:55 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 17 Mar 2001 18:06:55 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <00bb01c0ae33$7ff2fd50$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <00bb01c0ae33$7ff2fd50$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: Edward, then Tony: >> XML defines:: ... >> So the regexp would be something like:: >> >> [a-zA-Z_:][a-zA-Z0-9.-_:]* > Hmm. I might prefer to say "a Python identifier", then, as I don't need > the "namespace" bit (which is what the colon is for). I don't like the XML Name; having embedded : won't sit will with pythonic reading; but python identifiers are too restrictive. Have we considered the classic spec for labels to appear left of a colon, namely RFC 822 (e-mail headers) and its kin ? I think that basically comes down to r'\w+(-\w+)*' as regex, generally specified (certainly in HTTP's variant on the theme) to be read case-insensitively and conventionally rendering each word Capitalised (e.g. Rfc-822-Compliant is normalised, though RFC-822-compliant is read as the same identifier). We might want to allow _ as well as \w (indeed, we might want to define \w to include _ given that python effectively does so). Eddy. From Edward Welbourne Sat Mar 17 17:14:09 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 17 Mar 2001 17:14:09 +0000 (GMT) Subject: [Doc-SIG] What counts as a url? In-Reply-To: <3AB24895.8DB07998@lemburg.com> (mal@lemburg.com) References: <200103160432.f2G4WAp25746@gradient.cis.upenn.edu> <3AB24895.8DB07998@lemburg.com> Message-ID: See the standard module urlparse. What we should really do is add, to the standard module urlparse, a function which takes a string and returns the length of the initial segment which is a URL, -ve value on failure (albeit the value is guaranteed to be > 3 if it's not -ve; but -ve, or maybe 0, is the right answer for compatibility with existing string/regex match/find routines) so that the caller can then snip off this chunk and pass it to urlparse.urlparse() ... erm hello, the doc page (at 1.5.2) mentions a `parameters' chunk following a semicolon. Never heard of it myself: This corresponds to the general structure of a URL: scheme://netloc/path;parameters?query#fragment. ick - it doesn't cut netloc into name:port, let alone supply scheme's default for port when omitted. It also doesn't specify query-parsing (OK, so that ends up making contentious design decisions, so fair enough) or url-decoding (OK, so one can't do that to any unless also to the query fragments, which implies parsing the query first). Ho hum. >> ([a-zA-Z0-9-_.!~*'();/?:@&=+$,#] | %[0-9a-fA-F][0-9a-fA-F])+ > r'\b((?:http|ftp|https|mailto)://[\w@&#-_.!~*();]+\b/?)' erm ... I'm fairly sure you're allowed at most one # and at most one ? in an URL: any others *must* be url-encoded as %[0-9A-Fa-f]{2} tokens. I'm fairly sure you aren't allowed an & before the ? and that the # has to appear after the ? and all & Marc's regex doesn't mention = and ? explicitly, but they're definitely allowed in URLs. Are () really allowed in URLs ? How about {} and [] ? I'm fairly sure : and , are allowed in paths. But I'd expect :,{}()[]*! all to be url-endoced, anyway, so they shouldn't appear in the regexen; they're covered by % and \w. There is an RFC for URIs, I mailed it to Edward recently; I guess that'd be >> and looked up "RFC 2396":http://www.w3.org/Addressing/rfc2396.txt . so go read the appendices (pedantically). I know the relevant RFC has a helpful Appendix A giving BNF and Appendix B advising how to parse, complete with a regex for parsing (which presumes you *check* separately, based on the BNF). I really don't like that space between the URL and the full-stop (sorry, `period', to translate into North American Anglic); but, no, I can't see how to avoid it. Other than to treat the end of a URL as `this may have been the end of a sentence', even if it isn't followed by a . so authors of doc-strings know they can treat the URL as sentence-end (unconvinced). oh - Mark: > r'\b((?:http|ftp|https|mailto)://[\w@&#-_.!~*();]+\b/?)' did you really mean `from # to _ inclusive' ^^^ or did you mean to say `#, - or _' ? Hmm, I think you mean the latter: put - last in the [...] But the latter reading claims you've missed out / in the path, and the former claims most entries in your [...] are duplicates of ones in the #-_ range. I'm confused. If the - is last, and we mention = explicitly, we can phrase the character class as [...=;-] with its hair standing on end, which seems entirely appropriate. . Why do regexes always feel like the right answer to the wrong question - albethey useful - ? Edward: will working in EBNF spare us these messes ? (I'm assuming that's Extended Baccus-Nauer Form, subject to spelling.) Tibs: would mxTextTools let us say this stuff less uglily ? I'm inclined to advise running with the way the RFC's appendices' approach the problem, though: first, parse according to Appendix B's regex, then (it explains better than I can here) take the fragments into which it's cut your putative URL text and check each fragment for validity according to the appropriate rules in appendix A, which depend on the scheme; if any fail their check, decide that this wasn't a URL anyway. Albeit this means fully parsing the URL, so maybe the right function to add to urlparse is one which reads, from a string, the longest initial chunk which is a URL, returning a tuple whose first item is the length, remainder are urlparse.urlparse()'s answers (at least when the length is positive). > I don't think it makes sense to include schemes which are not > supported by your everyday browser, so only the most common ones > are included. I think it does make sense to include them, for two reasons: i) we should *recognise that the text is a URL* even when we know not what to do with it, if only so we can warn the user - the principle of least surprise says that if you *have to* surprise the user (whose browser does know about a scheme you're ignoring) you should at least have the decency to warn. ii) forward compatibility - someone may add a scheme that really does deserve to be in there, and the tool should need minimal revision to cope. and I thought most browsers *did* cope with gopher, which you omit ... Yes, I admit it, I'm an old fuddy-duddy. But the right answer is to use the urlparse module, not to ad-hock together your own; if you don't like how urlparse does things, fix it. (Note: I'm as guilty as anyone on this - I'd written a much longer version of this e-mail, complete with my own od-hack regex, before even thinking to look for a module, at which point I instantly *knew* the module was bound to exist - and not be to my liking.) Eddy. From Edward Welbourne Sat Mar 17 18:16:57 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 17 Mar 2001 18:16:57 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103161644.f2GGi9p15914@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103161644.f2GGi9p15914@gradient.cis.upenn.edu> Message-ID: Tony, then Edward: >> On the other hand, sometimes people want that effect (the >> difficulties of mixing presentation and markup, and not being >> specificallty a typesetting language - ho hum). > Too bad for them. :) If they really need *that much* control over > their typesetting, they shouldn't be using ST. If they're thinking that hard about layout, they aren't concentrating hard enough on writing API documentation and the result isn't going to be maintainable ('cos when I next fix a bug in their code, and update the docs, I'm going to have zero patience with their fancy layout, so I'm going to normalise it to something straightforward). If we pander to the folk who care more about appearance than information content, we're doomed. Look what happened to poor old HTML ... Eddy. -- Keep It Simple, Styoooooooooooopid. Keep It Straightforward, Simpleton. From ping@lfw.org Sat Mar 17 20:30:05 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Sat, 17 Mar 2001 12:30:05 -0800 (PST) Subject: [Doc-SIG] Re: ping Ping In-Reply-To: Message-ID: On Sat, 17 Mar 2001, Edward Welbourne wrote: > Hello, Ping, where are you ? > We need your response to our gabblings. Hi, Eddy. I'm sorry i haven't had time to really particpate in this discussion this past week. I've been watching all the e-mail go by but haven't formulated my responses yet. I was originally considering writing my own PEP if i could find the time. Thanks for the invitation. My general feeling about all of the syntax ideas that have been going back and forth is that i'm a little afraid of their complexity. When i have a moment i'll try to get a handle on what rules are currently on the table and see how many there are, but i'll definitely want to keep them minimal. > ah. So: Edward, Tibs and I, who have done this week's talking, all > agree on a position which puts `API docs' into the source file and puts > tutorials, reference docs, etc. elsewhere. I will maintain the opposing position for now, as devil's advocate. Here are some points to consider, just off the top of my head: - Just to be clear, the suggestion on the table is only to move the library ref manual into the modules, not the language reference or anything like that. - If documentation lives in the modules, we can guarantee that any user of the module has the information they need to understand and use it properly. - Allowing extended documentation in a module does not preclude other people from writing other documents, tutorials, books, etc. on a particular topic. - If documentation for a module doesn't live in the module itself, how will a user find it? One source of motivation for this suggestion was running "perldoc CGI" -- having a copy of CGI.pm guarantees that you have an instantly available and fairly comprehensive description of all the things you can do with CGI.pm. - Keeping modules and associated docs in the same file helps to ensure that the two are in sync when you distribute or edit the file. (It's not possible to have different versions of the code and the docs at the same time; it's less likely that someone will check in changes to one without updating the other, etc.) -- ?!ng From Edward Welbourne Sat Mar 17 20:47:14 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 17 Mar 2001 20:47:14 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103170258.VAA14151@cj20424-a.reston1.va.home.com> (message from Guido van Rossum on Fri, 16 Mar 2001 21:58:48 -0500) References: <008d01c0ac9e$ebb14f10$f05aa8c0@lslp7o.int.lsl.co.uk> <200103170258.VAA14151@cj20424-a.reston1.va.home.com> Message-ID: Thank you Guido. That makes sense. Not naughty after all ;^) > The ExtensionClass module in Zope actually implements class-like > objects that behave in such a way that at least the first example > (D.f.foo = 2) changes the f.foo value for class D but not for class C. > So this is not just theoretical! see, I knew it'd be the sort of think Jim Fulton plays with. (OK, `think' was a typo: but it deserves to survive ;^) Which encourages hope. First off, I'm going to disparage your second example: > class C: > def f(self): pass > f.foo = 1 > > x = C() > x.f.foo = 2 # Would also change value of C.f.foo! This is strictly another matter: x.f isn't really the same thing as C.f, it `should' be currie(C.f, x), if you see what I mean, hence a different object from C.f, so setting its .foo shouldn't affect C.f, even with the old semantics (even assuming one's allowed to setattr on a bound method, which sounds *very* dodgy to me). So I would be rude enough to call this example a bug in pre-2.1b1 python and ask that it be left out of the discussion ;^> But the `D.f is C.f' problem is real. So, on the one hand, when a method is inherited, > class C: > def f(self): pass > f.foo = 1 > > class D(C): > pass > D.f.foo = 2 # Would change value of C.f.foo! but, at the same time, if the derived class *does* over-ride the method, >>> class A: ... def f(self): """doc string of A.f""" ... >>> class B(A): ... def f(self): return ... >>> B.f.__doc__ # B.f wishes it would `inherit' from A.f, but doesn't >>> Now any attempt at getting the latter desideratum, without the former naughtiness, is going to be sophisticated: but can it be done ? Clearly there *is* a sophisticated munging phase when B's namespace, having been built by execution of B's suite, gets transformed and packaged so that B.f is no longer simply a function; is it possible, at that juncture, to arrange that it will borrow __doc__ off A.f ? (Only if B.f lacks __doc__, naturally.) This would involve some added magic in the type of unbound methods but that's a pretty magical type *anyway* and it *looks* like it should be feasible by applying games similar to (though hopefully less complex than) those used by ExtensionClass. A derived class' re-implementation `should' behave like that of the base, or it abrogates its ADT, so having the same doc as the method being over-ridde should be `usual'. The exceptions incur a tiny cost - they have to supply a doc string, which can be empty if they want, but really they *should* be explaining why they abrogate the ADT anyway, given that the base class's other methods might exercise the replacement - and, without this borrowing, the usual case implies gratuitous duplication - either of the doc string or of the assignment from base. The latter is really ugly - I should not have to type __doc__ in any ordinary piece of code; only in introspectors. Note (for anyone who missed it) that Tony discovered one needn't go via im_func, as long as one assigns *before* B's namespace gets munged: >>> class E(A): ... def f(self): pass ... f.__doc__ = A.f.__doc__ ... >>> E.f.__doc__ 'doc string of A.f' This is because f is still an ordinary function object (in particular, it isn't yet E.f; E doesn't yet exist) when the assignment happened. There is, of course, an obvious problem: multiple inheritance, when more than one base supplies the over-ridden method. The solution is, of course, to borrow off the method on the earliest of the bases to provide it and leave the implementor to over-ride that by assigning if they must. This will be rare enough not to be an issue; and it will simply work, because either * you assign as above, in which case E.f had a __doc__ before munging, so borrowing off E's bases' .f didn't get invoked; or * you assign after the class body, via im_func, in which case you over-ride what the munging has done and it still works. How's the time machine doing ? Do methods yet `inherit' __doc__ when not over-ridden ? Eddy. From Edward Welbourne Sun Mar 18 11:18:36 2001 From: Edward Welbourne (Edward Welbourne) Date: Sun, 18 Mar 2001 11:18:36 +0000 (GMT) Subject: [Doc-SIG] backslashing In-Reply-To: (message from Edward Welbourne on Sat, 17 Mar 2001 19:41:03 +0000 (GMT)) References: <200103161707.f2GH7Dp17705@gradient.cis.upenn.edu> Message-ID: Oh, and the argument I missed because it was too obvious and I'm too exhausted: verbatim means ... verbatim. If my doc string says client code should ... call #obj.out('\#')# ... to achieve some effect, I should expect some client code to contain the fragment:: obj.out('\#') even if some authors do realise that the \ should be elided, while others read the docstring via a rendering tool which elided it for them; and this will more-or-less certainly be a bug: the string given really does have a backslash in it and really isn't just '#'. It would have been better if your tools had forced me to put the fragment in a block, where I *could* have said:: obj.out('#') so would have done so and wouldn't have confused authors of client code. Likewise - if anything more so - for 'verbatim'. If you provide a backslash escape mechanism for verbatim and code fragments, nearly all uses of it will cause bugs (so, in fact, they will *be* bugs in the docs). Eddy. -- Stinginess with privileges is kindness in disguise. -- Guide to VAX/VMS Security, Sep. 1984. s/privilege/feature/ -- Eddy, 2001/March/18. From guido@digicool.com Sun Mar 18 17:12:15 2001 From: guido@digicool.com (Guido van Rossum) Date: Sun, 18 Mar 2001 12:12:15 -0500 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Sat, 17 Mar 2001 20:47:14 GMT." References: <008d01c0ac9e$ebb14f10$f05aa8c0@lslp7o.int.lsl.co.uk> <200103170258.VAA14151@cj20424-a.reston1.va.home.com> Message-ID: <200103181712.MAA23125@cj20424-a.reston1.va.home.com> > Thank you Guido. > That makes sense. > Not naughty after all ;^) > > The ExtensionClass module in Zope actually implements class-like > > objects that behave in such a way that at least the first example > > (D.f.foo = 2) changes the f.foo value for class D but not for class C. > > So this is not just theoretical! > see, I knew it'd be the sort of think Jim Fulton plays with. > (OK, `think' was a typo: but it deserves to survive ;^) > Which encourages hope. > > First off, I'm going to disparage your second example: > > class C: > > def f(self): pass > > f.foo = 1 > > > > x = C() > > x.f.foo = 2 # Would also change value of C.f.foo! > > This is strictly another matter: x.f isn't really the same thing as C.f, > it `should' be currie(C.f, x), if you see what I mean, hence a different > object from C.f, so setting its .foo shouldn't affect C.f, even with the > old semantics (even assuming one's allowed to setattr on a bound method, > which sounds *very* dodgy to me). So I would be rude enough to call > this example a bug in pre-2.1b1 python and ask that it be left out of > the discussion ;^> Sure! > But the `D.f is C.f' problem is real. > So, on the one hand, when a method is inherited, > > > class C: > > def f(self): pass > > f.foo = 1 > > > > class D(C): > > pass > > D.f.foo = 2 # Would change value of C.f.foo! > > but, at the same time, if the derived class *does* over-ride the method, > > >>> class A: > ... def f(self): """doc string of A.f""" > ... > >>> class B(A): > ... def f(self): return > ... > >>> B.f.__doc__ # B.f wishes it would `inherit' from A.f, but doesn't > >>> > > Now any attempt at getting the latter desideratum, without the former > naughtiness, is going to be sophisticated: but can it be done ? Clearly > there *is* a sophisticated munging phase when B's namespace, having been > built by execution of B's suite, gets transformed and packaged so that > B.f is no longer simply a function; is it possible, at that juncture, to > arrange that it will borrow __doc__ off A.f ? > (Only if B.f lacks __doc__, naturally.) Hm, this is entering a whole realm of stuff where Python isn't very helpful. Folks who know Eiffel have suggested inheriting pre- and post-conditions. Others, coming from C++, have suggested automatic calling of base class constructors in derived constructors. Now you suggest inheriting docstrings. Maybe there's something there, but it's definitely Python 3000 material... > This would involve some added magic in the type of unbound methods but > that's a pretty magical type *anyway* and it *looks* like it should be > feasible by applying games similar to (though hopefully less complex > than) those used by ExtensionClass. > > A derived class' re-implementation `should' behave like that of the > base, or it abrogates its ADT, Yes, but Python doesn't really try to enforce that (or even help you). > so having the same doc as the method > being over-ridde should be `usual'. Unclear. It depends a lot on what's in the docstring. I have written lots of docstrings that would be really misleading if they were inherited! > The exceptions incur a tiny cost - > they have to supply a doc string, which can be empty if they want, but > really they *should* be explaining why they abrogate the ADT anyway, > given that the base class's other methods might exercise the replacement > - and, without this borrowing, the usual case implies gratuitous > duplication - either of the doc string or of the assignment from base. > The latter is really ugly - I should not have to type __doc__ in any > ordinary piece of code; only in introspectors. > > Note (for anyone who missed it) that Tony discovered one needn't go via > im_func, as long as one assigns *before* B's namespace gets munged: > > >>> class E(A): > ... def f(self): pass > ... f.__doc__ = A.f.__doc__ > ... > >>> E.f.__doc__ > 'doc string of A.f' > > This is because f is still an ordinary function object (in particular, > it isn't yet E.f; E doesn't yet exist) when the assignment happened. f is still an ordinary docstring even after the class definition is complete -- but if you access it in the conventional way (as E.f) it is munged on the way out. E.__dict__['f'] also gives the function. (Not that I encourage using this!) > There is, of course, an obvious problem: multiple inheritance, when more > than one base supplies the over-ridden method. The solution is, of > course, to borrow off the method on the earliest of the bases to provide > it and leave the implementor to over-ride that by assigning if they > must. This will be rare enough not to be an issue; and it will simply > work, because either > > * you assign as above, in which case E.f had a __doc__ before munging, > so borrowing off E's bases' .f didn't get invoked; or > > * you assign after the class body, via im_func, in which case you > over-ride what the munging has done and it still works. > > How's the time machine doing ? > Do methods yet `inherit' __doc__ when not over-ridden ? > > Eddy. For this one, I prefer to use the time machine in the opposite direction. Let's move this set of ideas to a new design for Py3K. (PS, I regret that this is off-topic for doc-sig -- it should really be moved to python-dev.) --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Sun Mar 18 22:21:48 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 18 Mar 2001 23:21:48 +0100 Subject: [Doc-SIG] What counts as a url? References: <200103160432.f2G4WAp25746@gradient.cis.upenn.edu> <3AB24895.8DB07998@lemburg.com> Message-ID: <3AB534FC.98C14B7C@lemburg.com> Edward Welbourne wrote: > > >> ([a-zA-Z0-9-_.!~*'();/?:@&=+$,#] | %[0-9a-fA-F][0-9a-fA-F])+ > > r'\b((?:http|ftp|https|mailto)://[\w@&#-_.!~*();]+\b/?)' > erm ... > > I'm fairly sure you're allowed at most one # and at most one ? in an > URL: any others *must* be url-encoded as %[0-9A-Fa-f]{2} tokens. I'm > fairly sure you aren't allowed an & before the ? and that the # has to > appear after the ? and all & > > Marc's regex doesn't mention = and ? explicitly, but they're definitely > allowed in URLs. Are () really allowed in URLs ? Yes. > How about {} and [] ? No. See the RFC Apendix A for details. > I'm fairly sure : and , are allowed in paths. But I'd expect :,{}()[]*! > all to be url-endoced, anyway, so they shouldn't appear in the regexen; > they're covered by % and \w. > > There is an RFC for URIs, I mailed it to Edward recently; > I guess that'd be > >> and looked up "RFC 2396":http://www.w3.org/Addressing/rfc2396.txt . > so go read the appendices (pedantically). FYI, here's a working reference: http://sunsite.dk/RFC/rfc/rfc2396.html > I know the relevant RFC has a helpful Appendix A giving BNF and Appendix > B advising how to parse, complete with a regex for parsing (which > presumes you *check* separately, based on the BNF). > > I really don't like that space between the URL and the full-stop (sorry, > `period', to translate into North American Anglic); but, no, I can't see > how to avoid it. Other than to treat the end of a URL as `this may have > been the end of a sentence', even if it isn't followed by a . so authors > of doc-strings know they can treat the URL as sentence-end (unconvinced). Note that the RE I mentioned was not supposed to parse all URLs allowed by the different standards out there. The bug you found wasn't intended either, BTW ;-) The RE is basically a very simple approximation of what is allowed and finds most instances of URLs in plain text. > oh - Mark: > > r'\b((?:http|ftp|https|mailto)://[\w@&#-_.!~*();]+\b/?)' > did you really mean `from # to _ inclusive' ^^^ or did you mean to say > `#, - or _' ? Hmm, I think you mean the latter: put - last in the [...] > But the latter reading claims you've missed out / in the path, and the > former claims most entries in your [...] are duplicates of ones in the > #-_ range. I'm confused. It's a bug, just like the omission of "/=?" which was covered up by re.compile() using the whole range #-_ of characters... > If the - is last, and we mention = explicitly, we can phrase the > character class as [...=;-] with its hair standing on end, which seems > entirely appropriate. Good idea ;) Oh and please also add the slash and all other character in #-_ which could be useful in URLs. > . Why do regexes always feel like the right answer to the wrong > question - albethey useful - ? > Edward: will working in EBNF spare us these messes ? > (I'm assuming that's Extended Baccus-Nauer Form, subject to spelling.) Appendix A of the RFC has a "Collected" BNF form -- doesn't look any simpler than the RE, though, only less frightening. > Tibs: would mxTextTools let us say this stuff less uglily ? Not less ugly, but certainly with more certainty as to what passes and what not... > I'm inclined to advise running with the way the RFC's appendices' > approach the problem, though: first, parse according to Appendix B's > regex, then (it explains better than I can here) take the fragments into > which it's cut your putative URL text and check each fragment for > validity according to the appropriate rules in appendix A, which depend > on the scheme; if any fail their check, decide that this wasn't a URL > anyway. Albeit this means fully parsing the URL, so maybe the right > function to add to urlparse is one which reads, from a string, the > longest initial chunk which is a URL, returning a tuple whose first item > is the length, remainder are urlparse.urlparse()'s answers (at least > when the length is positive). Seems overly complicated to me, but if you really care for standards confrom URI recognition then I'd suggest to go ahead and write a patch for urllib which defines a function for finding URLs in text, e.g. findurl(text, start, end) -> (urlstart, urlend) or None. > > I don't think it makes sense to include schemes which are not > > supported by your everyday browser, so only the most common ones > > are included. > I think it does make sense to include them, for two reasons: > > i) we should *recognise that the text is a URL* even when we know not > what to do with it, if only so we can warn the user - the principle > of least surprise says that if you *have to* surprise the user > (whose browser does know about a scheme you're ignoring) you should > at least have the decency to warn. > > ii) forward compatibility - someone may add a scheme that really does > deserve to be in there, and the tool should need minimal revision > to cope. > > and I thought most browsers *did* cope with gopher, which you omit ... > Yes, I admit it, I'm an old fuddy-duddy. > > But the right answer is to use the urlparse module, not to ad-hock > together your own; if you don't like how urlparse does things, fix it. > (Note: I'm as guilty as anyone on this - I'd written a much longer > version of this e-mail, complete with my own od-hack regex, before even > thinking to look for a module, at which point I instantly *knew* the > module was bound to exist - and not be to my liking.) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From edloper@gradient.cis.upenn.edu Mon Mar 19 01:41:47 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 18 Mar 2001 20:41:47 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Sat, 17 Mar 2001 13:50:26 GMT." Message-ID: <200103190141.f2J1flp01199@gradient.cis.upenn.edu> Eddy, re. the issue of inheriting docstrings for methods from base classes, said: >> I'd solidly prefer to require an empty doc-string in a method which >> doesn't want to inherit the doc-string of its parent. I tend to agree; but it seems from Guido's mail like this isn't anything that's likely to happen soon. So I propose the following: 1. For now, do *not* recommend that people use:: f.__doc__ = parent.f.__doc__ 2. For now, recommend that *tools* inherit documentation for a method if f.__doc__ == None, and don't inherit if f.__doc__ = '' or any other string. 3. We can discuss writing a PEP, separate from all the docstring issues we're currently discussing, to handle inheritance of docstrings. 4. If/when such a PEP goes through, the tools could optionally be simplified, but nothing will break (because now the doc strings of the methods in question won't be None anymore). Sound reasonable? -Edward From edloper@gradient.cis.upenn.edu Mon Mar 19 03:19:55 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 18 Mar 2001 22:19:55 EST Subject: [Doc-SIG] backslashing In-Reply-To: Your message of "Sat, 17 Mar 2001 19:41:03 GMT." Message-ID: <200103190319.f2J3Jtp07955@gradient.cis.upenn.edu> Hm.. I'm starting to get convinced that backslashing with backslashes might not be optimal.. So, as a preface to what follows, I think of '...' and #...# as used for in-line literals. I.e., you can include them in sentences. Literal blocks are used for blocks that are separated off from the rest of the code. Thus, there's an important *semantic* difference between '...' and literal blocks. Given that, I don't think it's necessarily resonable to force people to put what really *should* be an in-line literal into a literal block. >> However, I can add to your list of places where it's `needed': \n is >> needed in embedded literals because ... we aren't allowing them to span >> multiple lines, right ? You might mean two things, here. 1. Including the 2-character string '\n' in a literal, intending it to be rendered as a backslash followed by an n. 2. Including an actual newline in a literal, intending it to be rendered as a line break. If we assume that '...' is for in-line literals, (1) makes sense, but (2) really doesn't; if you want line breaks in your literal, you should be using a literal block. If someone wants to discuss a string with a newline in it, they should probably use r"#'\n'#" (which will be rendered as:: '\n' in monospaced font). Or, if we're backslashing things, they should use r"#'\\n'#. >> Someone who wants to include a code fragment including a comment can >> perfectly easily put it into one of the block-style structures for >> enclosing non-plaintext, as I argued when proposing #...# for this role; >> I'm inclined to apply the same ruling to code fragments like:: >> >> script.write('echo HTTP/1.1 200 OK\n# no headers\necho') >> >> which describe python code which uses a # other than as a (python) >> comment character. If you decide you want to let me discuss, inline, a >> shorter cousin of this: I'll point at my uses of \n in it and ask >> whether you really want me to reactivate perverse counterexample mode. I assume that "script.write..." is in an r"..." string, otherwise it would be indistinguishable from:: script.write('echo HTTP/1.1 200 OK # no headers echo') Which would be rendered as such.. In this case, I'd agree, and say to use a literal block. But that's because you wouldn't normally include the string you gave in a sentence.. If we're talking about the string "x'", having to use literal blocks may be unreasonable. Consider the fictional example:: If the user types:: x' then the system should print the value of:: x'(a) and return the value of:: x'(b) This really *should* be rendered as a single sentence, but by forcing the doc writer to put everything in literal blocks, we force it to be rendered with each of those symbols in a separate display area... >> Adding an escape character requires us to make provision for escaping >> the escape (else, as Edward pointed out, we can't *end* a fragment with >> the escape character). At which point the ability of folk to work out >> how many backslashes they're looking at depends not only on counting the >> backslashes they can see, and on working out whether the string is >> r'...' or not, but also on whether they're inside an inline fragment >> right now. This *will* confuse pythoneers. I do agree. But I'm not sure what the best thing to do is. It's a little bit of a problem, *anyway*, because even if we ignore '\', doc writers have to think harder than they should if they want to use backslashes in their docs. :) >> Not even to save vertical space ;^| If it were just an issue of saving vertical space, I'd agree. But I want to make sure that everything reasonable *can* be documented s.t. it will look reasonable when formatted.. I'm less worried about saying "the 0.5% of people using forms like XYZ will have to go to extra trouble." But I may end up agreeing anyway, that the confusion is too much, and that those 0.5% will just have to deal using literal blocks, and possibly with having ugly formatted docs. :) We could also discuss ways of indicating that one-line literal blocks are "really" inlines (::: or some such), but I'm currently loathe to make ST even more complex. :) >> Oblige doc-strings which want to talk about a fragment, using the >> delimiter ST* uses for the relevant kind of inline fragment, to do the >> fragment as a block, not an inline. So are you saying we'd have 2 different kinds of literal blocks? We hadn't really discussed that before.. I think that just having literals, inlines, and literal blocks is probably enough, but if you want to make a case, go ahead. :) -Edward From tony@lsl.co.uk Mon Mar 19 09:51:25 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 19 Mar 2001 09:51:25 -0000 Subject: [Doc-SIG] What counts as a url? In-Reply-To: <3AB24895.8DB07998@lemburg.com> Message-ID: <00d001c0b05a$29bc85e0$f05aa8c0@lslp7o.int.lsl.co.uk> M.-A. Lemburg wrote: > FYI, I use this RE in my apps: > > r'\b((?:http|ftp|https|mailto)://[\w@&#-_.!~*();]+\b/?)' > > I don't think it makes sense to include schemes which are not > supported by your everyday browser, so only the most common ones > are included. Except that I'm paranoid (well, no, really just a worried pedant) and don't like trying to embed a complete list of resource/schemes in the RE - for instance, I've known people who would get upset by the absence of both "news" and "gopher" in the above. And if I were writing a Python library to *handle* a new scheme (for instance, perhaps, for Mozilla?) then I might be upset if I couldn't see it in my docstrings. Tibs (on the other hand, this *is* worth refining over time, and we need not get it *perfect* at the start). -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 19 09:53:35 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 19 Mar 2001 09:53:35 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103161700.f2GH0cp17194@gradient.cis.upenn.edu> Message-ID: <00d101c0b05a$76bb3760$f05aa8c0@lslp7o.int.lsl.co.uk> After a weekend where *some* work got done, significant points: 1. Newlines are preserved again in non-literal paragraphs (Edward Loper convinced me that the benefits outweighed the problems). 2. Newlines are not allowed within literal and Python literal strings. 3. Local references (which look like '[this]' or '[1]') are now supported. The "anchor" for a local reference must be at the start of a paragraph (in future releases I would expect it to *start* a new paragraph if at the start of a line), and looks like:: ..[this] 4. List items and local references may be "empty" paragraphs, but there may still be some unresolved issues with respect to newlines - I'm not sure that:: 1. Some text is allowed (it probably should be, if the form with a blank line between those two lines *is* allowed). 5. The RE used for detecting URLs has become more sophisticated. There are some associated rules - first, "odd" characters (which will be listed in the documentation) must be escaped, either as '&entity;' or as '%xx', and secondly, only a select group of characters may form the *last* character of a URL - essentially, [0-9A-Za-z/], or something like that - this means that "normal punctuation" cannot form the end of a URL (I don't regard these as very common!), and thus 'http://www.fred.jim/.' unambiguously ends a sentence with that full stop, it is not part of the URL. This is a Good Thing. The following are probably mostly in response to Edward Loper: I said that with REs you didn't detect errors > Well, it depends on how you're detecting errors... > > > plain: 'This ' > > emph: 'is "too' > > plain: ' confusing":http://some.url' > > Here, you could say that the string '":' without a matching '"' > is illegal, and raise an error.. That approach is what I meant when I talked about "a long RE for detecting common errors", and it is a sensible approach *if one is validating* - but the results should be warnings, 'cos one of the points of ST, originally, is that users should be able to "push the corners" a bit. > But from the point of view of someone formalizing the language, saying > "there's an ambiguity" is no good. I have to either explicitly say > "it's illegal" (=undefined) or "xyz is the correct answer." Oh, I agree, and it's a good thing to do. But you *do* have a third option, which is the "this behaviour produces undefined results", which is not *quite* the same as "illegal". > p.s., I'm not sure it's safe for us both to be writing email at the > same time. We might overload other peoples' mailboxes. :) Hmm. Of course, it's an attempt at a compromise between a private conversation, and a public dialogue that other people can chip into. Not a very *good* compromise, necessarily... (and damn, folding messages together clearly isn't going to work without spending some serious time on it, so it's back to the cacophony, I'm afraid). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 19 09:54:36 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 19 Mar 2001 09:54:36 -0000 Subject: [Doc-SIG] quoting In-Reply-To: <200103161707.f2GH7Dp17705@gradient.cis.upenn.edu> Message-ID: <00d201c0b05a$9b90fed0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > I think we should add that '\\' is a single backslash and #\\# is too. > Otherwise, there's no way to end a literal with a backslash.. Yes, I guess so. Can you hold that thought to hit me over the head with when I forget to document it in the STpy documentation, please? Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 19 10:27:57 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 19 Mar 2001 10:27:57 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103161758.f2GHwpp20696@gradient.cis.upenn.edu> Message-ID: <00d501c0b05f$443acda0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > Yeah, I've been playing a bit fast and loose with terminology in my > emails.. :) Speaking of terminology, I want to make sure that we're > using somewhat consistant terminology. In particular, I think my > use of the following terms may not coincide with what you call > things. What are your terms for the following? > > * inline = region marked with #hashes#. Python literal string. And something with 'quotes' is a literal string. > * paragraph = a text paragraph; not a list item or a heading or > a label Paragraph (not distinguished normally from the other sorts, which *also* have special names). If I had to distinguish this, I'd probably call it a "paragraph with a blank line before it" (remember, that *might* include the other sorts of thing, too). > * basic block = paragraph or list item or heading or label (or > table?) Paragraph (see above) > * blank line = (S* NL) | (S* EOS) blank line > * literal block = region following a '::'. literal paragraph (which is a *bit* misleading, as it can include blank lines) and a single (non-literal) paragraph starting with '>>>' is a Python paragraph. > * invalid string = string that is not given a meaning by an ST > variant. (in the terms used by the STminus proposal, strings > that are not assigned a structure by a language). I don't have a term for that, because docutils doesn't work like that. I *have* started to generate paragraphs that have a "badpara" tag, so it would be a "badpara" element (I'm following the old-fashioned ST approach of trying to markup what the user said and assuming they meant it, whilst you're trying to do the formal approach - this does leave a gap in talking). > Tibs continued: > > When I am talking, I have some assumptions (which, of > course, may not be > > evident): > > > > 1. by the time discourse occurs, all tabs have gone away > > Agreed. We should probably also discard/transform any whitespace > that isn't space or newline (e.g., form feed, carriage return). Agreed, but something I've ignored for now (unless my code does it without my looking - doutbful). > > 2. blank lines are blank lines - white space in them is ignored > > thrown away (lost for good) > > Is this true in literal blocks? Yes - by the "trailing whitespace is removed" rule. > Also, I'm guessing you collapse multiple consecutive blank lines > into one. Yes, but they get un-collapsed again within literal paragraphs (that's quite important, and a major deficiency in STNG, if it's still not done). (this does not, of course, happen for *Python* literal paragraphs, as they are defined to end at the first blank line - indeed, that (or end of string) is *all* that ends them.) > > 3. trailing whitespace is thrown away > > Trailing whitespace for the string as a whole? For each basic > block? For each line? Is this true in literal blocks? For each line. True in all places (you can't, in general, see them, so there we go). For literal blocks, newlines are preserved, but I can't see any obvious point in preserving trailing spaces. > > 4. literal paragraphs retain leading whitespace following "the > > rules" (which say they are actually indented relative to the > > preceding non-literal paragraph - this makes much more sense > > in ST than "with respect to the left margin"). > > Agreed. Although how do you put something at zero indentation? > Maybe indent from 1 space over from the preceeding paragraph? You don't. I've never wanted to (my problems with HTML normally come from trying to do the opposite). > So we won't use the term whitespace. Instead, we'll use the terms > space, newline, and blank line. Good by me - it also requires one to say "space or newline" when that is what one means. > > Clearly for a string literal that does not contain a > newline, spaces are > > to be transcribed to spaces (probably - flag a rendering issue as to > > whether they're *hard* spaces (the correct number) or *unbreakable* > > spaces (the correct number AND no newlines)). > > I vote for unbreakable, but it may be possible to persuade me. Given I've now forbidded newlines in (both types of) string literal again, I'd also go for unbreakable (my HTML output doesn't implement that, but who cares, it's only a testbed, and could be fixed later on). > > Equally clearly, if one does not allow newlines in string literals, > > that's the end of the matter. We've done our job. > > Which is what I vote for. :) And I now agree. My position wasn't strong enough to stand against nay-saying, I felt. > > > "Here the *name* 'contains' markup":url > > Hm.. I'm confused. So you would get:: > > Here the *name* 'contains' markup > > ? Or:: > > "Here the name contains markup":url Well, personally I'd never emit .. - but I had missed the preceding '::', so was answering on a different assumption, I think... At the moment, markup is not nested. At the moment, literals are scanned for first. So at the moment, a URL text containing a literal string will not be identified as such. In the future, I aim for markup to nest, and then your example would be legal, and do the "right thing". > One other case to consider is:: > > *"I would prefer this":url* to "*this*":url With no markup nesting, and with the current ordering, the first one is emphasised, so not a URL usage, and the second one is a URL, and so not emphasised. Of course, if we're aiming at 2.2, then it is quite possible nested markup might be available by then - I'm just not prepared to "waste" time on it now... > > > "This name spans multiple > > > lines":url > > Revised answer - that's definitely allowed, as newlines are explicitly allowed in the quoted part of a URL definition. Why? Because it's not harmful, it's a bit surprising if they're not (since they're allowed in *other* ".." situations), and I prefer it that way (erm...). > Still seems to me that names should be able to span newlines, though. So I think we're agreeing. > > > "the following is not a url": That's right. In this instance. > Yes, but do we get an error because we used '":' in a silly context > (if we're asking the parser to tell us about errors)? I can't see, in docutils (STminus is another kettle of fish) that error detection (apart from paragraph indentation and paragraph label detection) is other than a bunch of heuristics, almost certainly one or more REs, that point out *possible* problems to a user wanting validation. So it becomes a matter of identifying the set of REs we want to warn about. > > > Do *quotes "have to* nest" properly with coloring? > > But from the point of view of formalizing things, I have two > choices here: > 1. say that it contains a bold region, and the quotes are just > rendered as quotes > 2. say that it's undefined (i.e., an invalid string). Undefined isn't invalid - it's undefined. At least to me, even in a formal context, that's true (i.e., not "I don't know" but "I shan't decide"). On the other hand, once I'm sure I've got the order of markup/colourising correct, I'll be happy to regard it as so, and then you could "freeze" it. But is that a good approach? Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 19 10:35:48 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 19 Mar 2001 10:35:48 -0000 Subject: [Doc-SIG] What counts as a url? In-Reply-To: Message-ID: <00d701c0b060$5cd11850$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Welbourne ("Eddy" for reasons of clarity) wrote: > . Why do regexes always feel like the right answer to the wrong > question - albethey useful - ? Because they are, of course (well, actually, you have it exactly backwards) > Tibs: would mxTextTools let us say this stuff less uglily ? Well, in my opinion, yes, but that's because it's actually a proper parser, so one takes a different approach. Not that I'm volunteering to write it, mind you. > But the right answer is to use the urlparse module, not to ad-hock > together your own; if you don't like how urlparse does things, fix it. > (Note: I'm as guilty as anyone on this - I'd written a much longer > version of this e-mail, complete with my own od-hack regex, > before even thinking to look for a module, at which point I instantly > *knew* the module was bound to exist - and not be to my liking.) There are two problems here: 1. Find the candidate (possible) URL 2. Validate it as such The first is the one we're addressing proximately, and for once I would argue that it is better to find *too many* matches, rather than too few. The second is what Eddy appears to be talking about, with urlparse, etc. It is optional (i.e., one would only do it if validation is selected). It *may* be hard to "unstitch" the markup that has already occurred by the time validation is done, so it is likely to get left until later. Given a big problem, leave it until later... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 19 10:38:10 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 19 Mar 2001 10:38:10 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Message-ID: <00d801c0b060$b19561c0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Welbourne wrote: > Tony, then Edward: > >> On the other hand, sometimes people want that effect (the > >> difficulties of mixing presentation and markup, and not being > >> specificallty a typesetting language - ho hum). > > > Too bad for them. :) If they really need *that much* control over > > their typesetting, they shouldn't be using ST. > > If they're thinking that hard about layout, they aren't concentrating > hard enough on writing API documentation and the result isn't going to > be maintainable ('cos when I next fix a bug in their code, and update > the docs, I'm going to have zero patience with their fancy layout, so > I'm going to normalise it to something straightforward). I agree. I *think* we were talking about indented paragraphs and lists - it may well get simplified. (heh, we all agree - let's take over the world) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 19 10:44:22 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 19 Mar 2001 10:44:22 -0000 Subject: [Doc-SIG] backslashing In-Reply-To: Message-ID: <00d901c0b061$8f556af0$f05aa8c0@lslp7o.int.lsl.co.uk> Discussion by Eddy about problems with escaping quotes using, well, anything, omitted... OK - for the moment, the alpha documentation for STpy.py will hold off on the issue of quoting quotes, and leave it as an unresolved issue for the future (maybe, if I remember, pointing out that literal paragraphs get around the problem, sort of). We can then have a more detailed argument/discussion after that... Tibs (who doesn't have enough time to think hard on this) -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From Edward Welbourne Mon Mar 19 20:30:31 2001 From: Edward Welbourne (Edward Welbourne) Date: Mon, 19 Mar 2001 20:30:31 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103190141.f2J1flp01199@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103190141.f2J1flp01199@gradient.cis.upenn.edu> Message-ID: Edward said: > I tend to agree; but it seems from Guido's mail like this isn't > anything that's likely to happen soon. So I propose the following: a fix in the tools. Perfectly sensible. > 1. For now, do *not* recommend that people use:: > > f.__doc__ = parent.f.__doc__ s/For now, d/D/ Mayhap tell folk this is something they *can do* if they want to duplicate a docstring, but folk should never need to do it this way ... > 2. For now, recommend that *tools* inherit documentation for a > method if f.__doc__ == None, and don't inherit if > f.__doc__ = '' or any other string. I know I'm about to vary my tune but ... someone else has been talking persuasively out-of-band. Rather than borrowing the doc directly off the parent ... If f.__doc__ is None, it would make sense to provide a `default docstring' comprising a `See also' section (with `also' omitted for obvious reasons) referencing the corresponding methods in those of our bases that provide the given method. Optionally, if these methods are themselves using the default, shortcut to the ancestors which *do* provide something useful, though this would be extra work the user could save you by following a few links. However, if you do it this way, the correct rules for when Z.f's default doc refers to A.f's doc are: * A defines f and A.f.__doc__ is not None * there is a chain [A,...,Z] of classes in which each chain[i] is in chain[1+i].__bases__ (sorry about the fiddliness here, but Z may inherit from A via several chains) * no class in the chain (other than A) defines .f and provides a non-None .f.__doc__ But I'd argue for the simpler approach: just link to all bases which provide the given method and leave *their* default pages to provide chains of links to chase. This would have the advantage that you'd record the inheritance tree via which the given method could have been provided if it hadn't been over-ridden. Hmm. Indeed, I'd argue that *every* method's (auto-generated) documentation has a proper section which refers to *all* methods, on bases, with the same name, regardless of whether those methods have docstrings. One is apt to need to know. It is, however, a bit fiddly to determine which bases (here I mean the classes in the tree obtained by chasing __bases__ repeatedly) are providing the given method without it being overridden on the way, i.e. given Z, f, those A for which: * A defines f * there is a chain [A,...,Z] of classes in which each chain[i] is in chain[1+i].__bases__, as before * no class in the chain (other than A) defines .f The test for `C defines .f' is: 'f' in dir(C), by the way. For each base B of a class C to be checked: * if #B.f# raises AttributeError, we can ignore B * else if #'f' in dir(B)# [and B.f.__doc__ is not None], B.f is interesting * else we need to check B where the [and ... None] clause is only involved if you want to do the short-cut. Hmm. Maybe not that fiddly. Code follows (for both alternatives, nearly identical). It isn't noticably harder to do the job right going straight to the ones with docs than going to the ones with the method but no doc; however, I'm inclined to argue for going via all defined intermediaries, even without docs. Eddy. def baseswithmeth(meth, klaz): todo, result, seen = [ klaz ], [], [] while todo: c, todo = todo[0], todo[1:] seen.append(c) try: getattr(c, meth).__doc__ # the .__doc__ filters out classes with a non-method called meth except AttributeError: pass else: if meth in dir(c): result.append(c) else: for base in c.__bases__: if base not in todo and base not in seen: todo.append(base) return result def baseswithmethdoc(meth, klaz): todo, result, seen = [ klaz ], [], [] while todo: c, todo = todo[0], todo[1:] seen.append(c) try: doc = getattr(c, meth).__doc__ except AttributeError: pass else: if doc is not None and meth in dir(c): result.append(c) else: for base in c.__bases__: if base not in todo and base not in seen: todo.append(base) return result # Now do you see what Tibs meant about vertical space ? # He maintains code I left behind when I changed jobs ... From edloper@gradient.cis.upenn.edu Tue Mar 20 20:40:04 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Tue, 20 Mar 2001 15:40:04 EST Subject: [Doc-SIG] URLs Message-ID: <200103202040.f2KKe4p15488@gradient.cis.upenn.edu> Since [] are not allowed in URLs, and we already have expressions of the form "name":[ref], how does the following sound: Use "name":[ref] for in-line hrefs. If ref is a single token, and there is a directive of the form: ..[ref] url Then use url as the URL; otherwise, use ref as the URL. Of course, we'd want to talk to the STNG people about this, but it seems to solve a number of problems: 1. detecting the end of the URL is trivial 2. "name":url. is no longer ambiguous, because we would either say "name":[url]. or "name":[url.] 3. It seems easier to read 4. '":[' is much less likely to occur unintentionally in text than '":' is. So we don't have to worry about people saying things like "foo":bar by accident, instead of intending an href. If we can agree upon it among ourselves, then I think we should start trying to convince the STNG people.. -Edward From edloper@gradient.cis.upenn.edu Tue Mar 20 20:44:18 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Tue, 20 Mar 2001 15:44:18 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Sat, 17 Mar 2001 18:06:55 GMT." Message-ID: <200103202044.f2KKiIp15751@gradient.cis.upenn.edu> >> Have we >> considered the classic spec for labels to appear left of a colon, namely >> RFC 822 (e-mail headers) and its kin ? I think that basically comes >> down to r'\w+(-\w+)*' as regex, generally specified >> [...] Fine with me. >> We might want >> to allow _ as well as \w (indeed, we might want to define \w to include >> _ given that python effectively does so). I asssumed we were treating '\w' as it's defined in the re module, in which case it already does include '_': >>> re.match('\w', '_') Basically re defines '\w' = '[0-9a-zA-Z_] -Edward From Edward Welbourne Tue Mar 20 23:23:33 2001 From: Edward Welbourne (Edward Welbourne) Date: Tue, 20 Mar 2001 23:23:33 +0000 (GMT) Subject: [Doc-SIG] backslashing In-Reply-To: <200103190319.f2J3Jtp07955@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103190319.f2J3Jtp07955@gradient.cis.upenn.edu> Message-ID: > So are you saying we'd have 2 different kinds of literal blocks? We > hadn't really discussed that before.. I think that just having > literals, inlines, and literal blocks is probably enough, but if you > want to make a case, go ahead. :) OK, two orthogonal questions about a verbatim fragment: * inline or block * python code or `alien text' giving us four kinds of `verbatim' fragment in doc-strings. As I've been understanding you, '...' is an inline alien and #...# is an inline python expression. I've been presuming that there are also mechanisms for including (and distinguishing between) blocks whose contents are python or alien in like manner; however, I grant that I've only seen the '::' marker (unless >>> on each line is the markup for python, which I won't like, given that it's on each line) used in such a role, and don't know whether you meant the block it introduces to be read as alien or python. If you don't provide for this distinction, I'm worried. The python/alien distinction is important, because a python fragment is worth the renderer scanning for identifiers it knows about, so may wish to render as xrefs to pertinent documentation (however, this is the *only* processing the renderer should be doing to it); an alien fragment *should not* be so scanned; it is `truly verbatim' and any similarity between sub-texts of it and anything the doc-system knows about *should* be presumed to be fortuitous - otherwise, we have to put in mechanisms for enabling the author of the doc-string to, somehow, indicate `no, really, this *is not* a use of the python identifier which happens to be spelt the same as it' in a verbatim text, which must abrogate its verbatimness. While I recognise that the inline/block distinction would ideally pander to one's natural desire to have the text flow nicely, I consider this a layout issue, not a markup one. [Further: I want to type your fictional example as:: If the user types:: x' then the system should print the value of:: x'(a) and return the value of:: x'(b) without the blank lines you inserted in it; the '::' on the end of each text line, and the return to its indentation at the next, should suffice to enable parsers to know what I meant. Note that I am treating the given fragments as alien verbatim; the user is clearly typing at a prompt which is not a python prompt; and x'(a), x'(b) are presumably reading x' as `the derivative' of some entity named by x.] I regard layout-control as a luxury, subordinate to keeping the markup language simple. I want to be sure that if something appears in a verbatim fragment, at least when I'm inside an r"""...""" string, one can cut-and-paste the fragment into whatever alien context it belongs in and have it be exactly the right thing; and the formatted output of the given raw string should display the relevant fragment verbatim. This seems more important than being able to inline the tiny proportion (namely the cases using the inline-delimiter) of the uses one has for fragments. > I think of '...' and #...# as used for in-line literals. I.e., you > can include them in sentences. Literal blocks are used for blocks > that are separated off from the rest of the code. (final word, `code': I presume you meant `text'). To me, this is a layout distinction, not a semantic one. Consequently, > I don't think it's ... resonable to force people to put what really > *should* be an in-line literal into a literal block. I don't see how `should' can ever be real here. As I understand it, at worst one has to put up with `oh dear, this is going to be ugly, ho hum' rather than `this is going to mean something different'. Obviously, block means something different *to the doc tools* than inline means; but the only meaning I care about is the information content the reader gets hold of at the end. > even if we ignore '\', doc writers have to think harder than they > should if they want to use backslashes in their docs. :) As long as they use r'...', and don't want to end their string in an odd number of backslashes, I see no problem: please give an example. Raw strings are either invalid or read exactly the way they appear. No thought is required. >>> r'\' File "", line 1 r'\' ^ SyntaxError: invalid token >>> so one cannot end a raw string in a single backslash; but >>> r'\\' '\\\\' >>> r'\'' "\\'" >>> r""" \" """ ' \\" ' >>> r""" \' """ " \\' " >>> r'''\'''' "\\'" >>> r'\n' '\\n' >>> anything it doesn't reject, it preserves faithfully. >>> Not even to save vertical space ;^| > If it were just an issue of saving vertical space, I'd agree. sorry, I didn't make myself clear. Having to add vertical space in these cases is going to annoy *me*, despite which I would rather endure this annoyance than have the entire inline mechanism be (IMO) broken. In the relevant cases, the resulting document, once rendered, will split things into several paragraphs separated by displays, where I would, indeed, prefer to have said the same thing in a single paragraph; this *will* annoy me and will clearly annoy other folk more than the extra vertical space in the source file, but please understand that for me it's the other way round and *despite that* I'm arguing for you to oblige me to break up the text into blocks. Even if you *do* insist on me putting back the blank lines I took out of your fictitious example. (Though I'll be objecting to that independently.) > We could also discuss ways of indicating that one-line literal blocks > are "really" inlines (::: or some such), but I'm currently loathe to > make ST even more complex. :) the complexity argument is *exactly* the one I'm focussed on. Out-of-source documentation should provide the means for folk to say things the way they want and have it displayed the way they want: for in-code ST docs, simplicity of markup language is *more important* than making it look nice. It suffices that we ensure that the author can express the information the reader needs. A few rare cases where this will merely be ugly are not worth extra complexity. > I assume that "script.write..." is in an r"..." string, that was, indeed, my intent; given which, > You might mean two things, here. only meaning 1 is, to my mind, a credible candidate. Meaning 2 doesn't read my text verbatim. > you wouldn't normally include the string you gave in a sentence. OK, so I was trying to be realistic, which made my text long. How about (from the docs of an imaginary ftp.py):: the send method executes:: sock.control.write("#") after every #chunk# bytes of data have been written to #sock.out# in which I would normally have wanted to inline the code fragment, but the presence of a # in it conflicts with the #...# delimiter. One could make up shorter examples; but, at least in the present case, one can side-step the problem:: the send method writes one '#' character to #sock.control# for every #chunk# bytes it writes to #sock.out#. albeit I may be marking more things with #...# than I need to (am I ?). Indeed, in the small proportion of situations where I can realistically believe in needing to escape the delimiter of an inline fragment, I am inclined to suggest the author think a bit about whether there isn't some other way of phrasing the text so as to avoid the problem. [It's a bit like how `politically correct' folk used to spend time and effort trying to get us all to settle on a gender-neutral non-neuter pronoun, but grown-up folk have simply learned to side-step the problem - partly by reviving the pronoun `one', partly by avoiding constructions which oblige us to use pronouns in the places where Anglic's provision of them is unfaithful to the author's intent.] If no such rephrasing is possible, the worst we impose on them is that they have to break out into a block structure; which won't *look* right, but will none the less express the information they intended to express. Now, with alien verbatim (as opposed to python verbatim), I realise there is a problem; alien text can have absolutely anything in it, so alien fragments using the delimiter can't be relied on to be rare. Yet, if we're to serve it up verbatim, we should serve it up verbatim. If we can't do it *verbatim*, at least inside `raw' strings, then we should scrap the verbatim inline mechanism. The proposed `fix' breaks its verbatimness, i.e. fixes one thing while breaking another. That's not an acceptable fix. Eddy. From Edward Welbourne Wed Mar 21 01:10:23 2001 From: Edward Welbourne (Edward Welbourne) Date: Wed, 21 Mar 2001 01:10:23 +0000 (GMT) Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <00d101c0b05a$76bb3760$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <00d101c0b05a$76bb3760$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: > 5. The RE used for detecting URLs has become more sophisticated. There > are some associated rules - first, "odd" characters (which will be > listed in the documentation) must be escaped, either as '&entity;' or as > '%xx', and secondly, only a select group of characters may form the I do not believe that &entity; has any place in an URL. It's a purely SGML/HTML beast, nothing to do with HTTP. The rest of 1--5 looks good. Eddy. From Edward Welbourne Wed Mar 21 01:02:36 2001 From: Edward Welbourne (Edward Welbourne) Date: Wed, 21 Mar 2001 01:02:36 +0000 (GMT) Subject: [Doc-SIG] quoting In-Reply-To: <00d201c0b05a$9b90fed0$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <00d201c0b05a$9b90fed0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: >> I think we should add that '\\' is a single backslash and #\\# is too. >> Otherwise, there's no way to end a literal with a backslash.. > Yes, I guess so. Can you hold that thought to hit me over the head with > when I forget to document it in the STpy documentation, please? r""" ... '\\' ... """ contains a verbatim literal with two backslashes in it. There is no way to end a raw string with a single backslash either; but it turns out not to be such a huge problem, after all; and, if '\\' is to be only one backslash, one gets to revisit the entire nightmare of ... users wanting to know why r""" ... '\n' ... """ contains a two-character literal fragment, \n, while r""" ... '\\\\\\' ... """ contains a three-character literal fragment but r""" ... '\\\n\\' ... """ contains a four-character literal fragment and ... and I'm too tired to go into this, but it's a nightmare waiting to bite. Eddy. From mal@lemburg.com Wed Mar 21 10:19:20 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 21 Mar 2001 11:19:20 +0100 Subject: [Doc-SIG] URLs References: <200103202040.f2KKe4p15488@gradient.cis.upenn.edu> Message-ID: <3AB88028.605B88A6@lemburg.com> "Edward D. Loper" wrote: > > Since [] are not allowed in URLs, and we already have expressions of > the form "name":[ref], how does the following sound: > > Use "name":[ref] for in-line hrefs. If ref is a single token, and > there is a directive of the form: > > ..[ref] url > > Then use url as the URL; otherwise, use ref as the URL. > > Of course, we'd want to talk to the STNG people about this, but it > seems to solve a number of problems: > 1. detecting the end of the URL is trivial > 2. "name":url. is no longer ambiguous, because we would either say > "name":[url]. or "name":[url.] > 3. It seems easier to read > 4. '":[' is much less likely to occur unintentionally in text than > '":' is. So we don't have to worry about people saying things > like "foo":bar by accident, instead of intending an href. > > If we can agree upon it among ourselves, then I think we should start > trying to convince the STNG people.. Sounds like a good idea, but don't you use angular brackets ? These are recommended by the URI RFC, in wide use everywhere and have similar properties... -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From tony@lsl.co.uk Wed Mar 21 10:32:36 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 21 Mar 2001 10:32:36 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103202044.f2KKiIp15751@gradient.cis.upenn.edu> Message-ID: <001c01c0b1f2$3ee60d80$f05aa8c0@lslp7o.int.lsl.co.uk> > >> Have we > >> considered the classic spec for labels to appear left of a > colon, namely > >> RFC 822 (e-mail headers) and its kin ? I think that > basically comes > >> down to r'\w+(-\w+)*' as regex, generally specified > >> [...] > > Fine with me. I'm assuming we're talking about paragraph labels. I think we should just go with the English definition of a word, which means [-A-Za-z], and leave it at that. It is *meant* to look like a word. Just because there is a colon there doesn't mean it is related to other fields that happen to end with a colon. The current default labels are:: label_dict = {"Arguments":"arguments", "Author":"author", "Authors":"author", "Dedication":"dedication", "History":"history", "Raises":"raises", "References":"references", "Returns":"returns", "Version":"version", } If one is translating (slightly modified format) PEPs, then one would instead use:: builder.label_dict = {"PEP":"pep", "Title":"title", "Version":"version", "Author":"author", "Status":"status", "Type":"type", "Created":"created", "Post-History":"post-history", "Discussions-To":"discussions-to", } I think "keep it simple" is required here - these labels are meant to be few and simple, so English words seems sensible to me. I would thus vote against underlines and against digits. Also, validation aside, I don't *use* a regular expression - I look for the right "shape" of paragraph (1 line, colon in it) and check what is to the left of the colon against the dictionary. From *my* point of view the legitimate characters idea only comes in with a validation phase (of course, it would be different for Edward). > Basically re defines '\w' = '[0-9a-zA-Z_] Erm - basically it doesn't - it invokes "locales" which makes life more complex (and I have no idea what sre does about '\w'). From tony@lsl.co.uk Wed Mar 21 10:45:10 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 21 Mar 2001 10:45:10 -0000 Subject: [Doc-SIG] backslashing In-Reply-To: Message-ID: <001d01c0b1f4$00d24250$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Welbourne wrote: > OK, two orthogonal questions about a verbatim fragment: > * inline or block > * python code or `alien text' > giving us four kinds of `verbatim' fragment in doc-strings. > > As I've been understanding you, '...' is an inline alien and > #...# is an inline python expression. Correct. There are (module spanish inquisition) reasons for having the two forms: 1. Python literals commonly include a single quote, but rarely include comments (this was the whole basis of your suggesting this notation, Eddy!). Trying to use single quotes to indicate Python literals would be a right pain. 2. I suspect that it *may* be useful to regard all Python code fragments that contain a single Python entity (be it name or function call) as potential "local links" - i.e., generate a reference from them. I know that I didn't like Ka-Ping Yee's approach of aggressively looking for all *potential* links in unmarked-up text, but once someone has indicated that something is, indeed, Python code, I feel this is a risk I'm more prepared to take. > I've been presuming that there are also > mechanisms for including (and distinguishing between) blocks whose > contents are python or alien in like manner; however, I grant > that I've only seen the '::' marker (unless >>> on each line is > the markup for python, which I won't like, given that it's on each > line) used in such a > role, and don't know whether you meant the block it introduces to be > read as alien or python. If you don't provide for this > distinction, I'm > worried. No. '::' introduces a literal "block", whose contents are not parsed. The contents may include blank lines. '>>>' at the start of a paragraph indicates that *that paragraph* is Python code. Such a paragraph ends at the next blank line (or end of file!). This is intended to allow the visual distinction of text that will be specially handled by doctest (which is now a formal part of the Python package, and whose use is To Be Encouraged (my opinion)). They are thus serving a different purpose. > [Further: I want to type > your fictional > example as:: > If the user types:: > x' ...etc... > without the blank lines you inserted in it; the '::' on the > end of each > text line, and the return to its indentation at the next, > should suffice > to enable parsers to know what I meant. The whole issue of *when* we start new paragraphs is likely to be a major research issue after docutils 1.0, I suspect - we already know how easy it is to boobytrap ourselves. I'll leave the rest - I have a feeling that: a. You're arguing towards agreement (indeed, you may already agree) b. I'm not going to introduce any form of quoting in STpy alpha1 c. Probably not in STpy 1.0, either Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Wed Mar 21 10:52:02 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 21 Mar 2001 10:52:02 -0000 Subject: [Doc-SIG] URLs In-Reply-To: <200103202040.f2KKe4p15488@gradient.cis.upenn.edu> Message-ID: <001e01c0b1f4$f64ad5d0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > Since [] are not allowed in URLs, and we already have expressions of > the form "name":[ref], how does the following sound: > > Use "name":[ref] for in-line hrefs. If ref is a single token, and > there is a directive of the form: > > ..[ref] url > > Then use url as the URL; otherwise, use ref as the URL. Inline refs were introduced deliberately to look like footnotes - that is, the rendering of the '[..]' is *meant* to be identical (so [fred] in STpy text should look like [fred] in the final HTML, TeX or whatever, module underlining and other indicators). Requiring *inline* refs to have funny quoted text in front of them would reduce their usefulness. So far as I'm concerned, they make sense, I've implemented them in docutils (trivial) and I *like* them (the only problem with them is the introductory '..' on the anchors, and I can't see a way around that that's simple). They match a convention people already use. 'nuff said. As a separate issue, for *non* inline references, it would indeed be quite nice if we could delimit all URLs in some way (using '<' and '>' would actually be a lot more traditional). But I think this is way too late on the "compatibility with all other forms of ST" basis - i.e., this would be a big break with the past. *But* raise it over on the STNG side. If they went for requiring:: "some text": "more text": "more more", instead of:: "some text":http://some.url/ "more text":fred.html#label "more more", http://fred.label http://www.bare.url *then* I would go for it (but only then). It would indeed make life a lot simpler. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Wed Mar 21 16:48:36 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 11:48:36 EST Subject: [Doc-SIG] Re: ping Ping In-Reply-To: Your message of "Sat, 17 Mar 2001 12:30:05 PST." Message-ID: <200103211648.f2LGmbp00795@gradient.cis.upenn.edu> > My general feeling about all of the syntax ideas that have been going > back and forth is that i'm a little afraid of their complexity. When > i have a moment i'll try to get a handle on what rules are currently > on the table and see how many there are, but i'll definitely want to > keep them minimal. I tend to agree about ST being too complex in some ways. I'm currently working on 2 separate PEPs, one that includes all the features that Tibs wants, and one that has only a more limited set. I think that we may have an easier time selling a simpler proposal to the Python community, but I may be wrong.. We'll wait until all the PEPs are out on the table, and discuss them, I guess. > > > ah. So: Edward, Tibs and I, who have done this week's talking, all > > agree on a position which puts `API docs' into the source file and puts > > tutorials, reference docs, etc. elsewhere. > > I will maintain the opposing position for now, as devil's advocate. Good. We may need more devil's advocates as we get closer to agreement, because we'll need to be able to answer these questions when they come up on python.dev. :) > - Just to be clear, the suggestion on the table is only to > move the library ref manual into the modules, not the > language reference or anything like that. Only the ref manual, or also the howtos, tutorials, etc? And if only the ref manuals, what do we currently think belongs in the ref manuals, other than API docs? > - If documentation for a module doesn't live in the module > itself, how will a user find it? I think that this is a question that we'd have to answer, whether we put more docs in the module or not -- how does the user find the tutorials, howtos, etc., that were *not* written by the programmer/maintainer? Currently, www.python.org does pretty well at this, but we may want to set up a more principled system.. But in any case, I think this is a separate issue.. > - Keeping modules and associated docs in the same file helps > to ensure that the two are in sync when you distribute or > edit the file. (It's not possible to have different > versions of the code and the docs at the same time; it's > less likely that someone will check in changes to one > without updating the other, etc.) 2 issues: editing and distribution distribution -- maybe we want to turn modules into packages, and include docs in the package? There's not a lot of precedent for this in other languages though.. editing -- I think that keeping modules & docs in the same file will help keep docs in sync with modules *if* we're talking about what has been called "point-documentation"... But I don't think it'll help for howtos, tutorials, etc. It's unreasonable to edit the tutorial every time you change the code. -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 17:02:02 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 12:02:02 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Mon, 19 Mar 2001 09:53:35 GMT." <00d101c0b05a$76bb3760$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103211702.f2LH23p02270@gradient.cis.upenn.edu> > 1. Newlines are preserved again in non-literal paragraphs (Edward Loper > convinced me that the benefits outweighed the problems). > 2. Newlines are not allowed within literal and Python literal strings. Yay! I'll code that up in STminus002 as soon as I get a chance. (I should be done with STminus002 relatively soon). > 3. Local references (which look like '[this]' or '[1]') are now > supported. The "anchor" for a local reference must be at the start of a > paragraph (in future releases I would expect it to *start* a new > paragraph if at the start of a line), and looks like:: > > ..[this] So... are anchors always hrefs? Or can they be generic footnotes? Or references for a references section? How should we deal with these when we're using something other than HTML (e.g., LaTeX) to render the string? If anchors can be footnotes or references, how does the renderer decide what to do with them? > 4. List items and local references may be "empty" paragraphs, but there > may still be some unresolved issues with respect to newlines - I'm not > sure that:: > > 1. > Some text > > is allowed (it probably should be, if the form with a blank line between > those two lines *is* allowed). I'll add this too. BTW, how are you currently handling things like this:: 1. some text some more text Is that a list item with 2 paragraphs, or a list item with some contents and 1 subparagraph, etc? I.e., how would it get rendered in whatever XML-like thing you're using? > 5. The RE used for detecting URLs has become more sophisticated. There > are some associated rules Hm.. I don't look forward to formalizing this, and trying to get STNG to agree with your regexps :) > That approach is what I meant when I talked about "a long RE for > detecting common errors", and it is a sensible approach *if one is > validating* - but the results should be warnings, 'cos one of the points > of ST, originally, is that users should be able to "push the corners" a > bit. Or errors, if the user asks for them to be errors. :) Note also that it should be possible to generate the "long RE expression" in a *principled* way, given a formalization, so that it will detect *all* errors, not just *common* errors. > > But from the point of view of someone formalizing the language, saying > > "there's an ambiguity" is no good. I have to either explicitly say > > "it's illegal" (=undefined) or "xyz is the correct answer." > > Oh, I agree, and it's a good thing to do. But you *do* have a third > option, which is the "this behaviour produces undefined results", which > is not *quite* the same as "illegal". Ok, in the formalization system I set up, I divided everything into "valid" and "undefined". I see a good argument for further dividing "undefined," though.. So I'll redefine my terms, as such: valid -- The string has a unique, predictable result. this is the same result that it will have in all future versions. invalid -- The string does not have a unique, predictable result illegal -- The string will never have a unique, predictable result undefined -- The string does not currently have a unique, predictable result, but it may in a future version. Is that acceptable terminology? (I'll try to remember to stick to it) -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 17:09:13 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 12:09:13 EST Subject: [Doc-SIG] Local References Message-ID: <200103211709.f2LH9Dp02661@gradient.cis.upenn.edu> > 3. Local references (which look like '[this]' or '[1]') are now > supported. The "anchor" for a local reference must be at the start of a > paragraph (in future releases I would expect it to *start* a new > paragraph if at the start of a line), and looks like:: > > ..[this] Clarification on the syntax.. is *anything* that looks like [this] a local reference, or does it have to be preceeded by "a parenthetical like"[this] or "a parenthetical and a colon like":[this]? If one of the latter, does [this] get rendered with brackets? Flagged as a warning when validating (in principle, not in current implementation)? What happens if the referent is missing? What is acceptable content for [this]? '[\w_-]+'? -Edward From ping@lfw.org Wed Mar 21 17:14:56 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 21 Mar 2001 09:14:56 -0800 (PST) Subject: [Doc-SIG] Re: ping Ping In-Reply-To: <200103211648.f2LGmbp00795@gradient.cis.upenn.edu> Message-ID: On Wed, 21 Mar 2001, Edward D. Loper wrote: > > - Just to be clear, the suggestion on the table is only to > > move the library ref manual into the modules, not the > > language reference or anything like that. > > Only the ref manual, or also the howtos, tutorials, etc? And if > only the ref manuals, what do we currently think belongs in the > ref manuals, other than API docs? What i had in mind was "how to use this module". Let's look at some examples for clarification. As an extreme example, try running "perldoc CGI". CGI.pm contains about 3200 lines of code followed by 3000 lines of detailed documentation. While the module itself is indeed enormous, i think that it is useful to have all of that information about how to use the CGI module instantly available right there in CGI.pm. A more reasonable arrangement would be to split the CGI functionality into several modules, and move the relevant parts of the docs acoordingly. For instance, CGI.pm currently tries to do the work of both cgi.py and HTMLgen. But *if* CGI is going to do all that, it should all be documented in CGI.pm. Look at the sections in these examples and give an opinion on what belongs or doesn't belong with the source code: perldoc CGI perldoc CPAN perldoc CGI::Cookie perldoc Data::Dumper perldoc Getopt::Long perldoc Net::Ping perldoc overload > > - If documentation for a module doesn't live in the module > > itself, how will a user find it? > > I think that this is a question that we'd have to answer, whether > we put more docs in the module or not I think it's relevant. The question is "how far is the user of a module from *some* information on how to use the module?" Doesn't matter if they don't have every article that anyone has ever written about the module -- do they have a starting point? > editing -- I think that keeping modules & docs in the same file > will help keep docs in sync with modules *if* we're talking > about what has been called "point-documentation"... But I > don't think it'll help for howtos, tutorials, etc. It's > unreasonable to edit the tutorial every time you change the > code. If changing the code changes the behaviour of the module so that your examples don't work any more, then yeah, you'd better edit the examples. (Scenario: i've changed a method name from foo() to spam(), so in my editor i search for ".foo(" to do replacement...) It's also harder for me to change foo() to spam() in just the code, check in just that part, and say "oh, i'll change the docs later" -- because i'll be checking in a single file that's inconsistent with itself. -- ?!ng Happiness comes more from loving than being loved; and often when our affection seems wounded it is is only our vanity bleeding. To love, and to be hurt often, and to love again--this is the brave and happy life. -- J. E. Buchrose From edloper@gradient.cis.upenn.edu Wed Mar 21 17:19:44 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 12:19:44 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Mon, 19 Mar 2001 10:06:36 GMT." <00d401c0b05c$488cca00$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103211719.f2LHJjp03408@gradient.cis.upenn.edu> > Well, prepare to be well miffed (ST has never supported differing > starting and ending quotes). So hey. Although now we have [...] (or "..."[...] or "...":[...] or whatever it really is). > It should (eventually) in the '"..."' text, but not in the URL itself. > This is actually a good reason to forbid apostrophe in URLs, Of course, you can't reasonably forbid '#' in URLs, so you'll have to put URL recognition before inline recognition *anyway*.. :) > and may > mean I need to put the URL recognition *before* literal recognition - > no, that won't work, 'cos then I couldn't say > > 'http://www.literal.org/' > > Hmm. This is a no-win situation, I'm afraid. Ah - no it's not, because > I'm requiring the user to escape spaces in a URL, and not to end with > "funny" characters, so it *should* actually come out in the wash - we'll > need to make some careful test cases... I think there's a serious problem here if we are allowing URLs to appear in arbitrary places. For example, consider:: foo://no#good bar://parse#for this. It seems perfectly reasonable for #good bar...# to be a literal.. But then it also seems reasonable for those to be urls.. Possible ways out: 1. Say that the opening '#' must have whitespace to its left, and the closing '#' must have whitespace to its right. Of course, that forbids saying things like #Object#s, but I guess I could live with that 2. Use some special demarkation for URLs! :) I'm for this, but am worried about trying to convince the STNG people, esp. if we're proposing using <..>.. Since they're currently saying that such things should be ignored. Of course, they're clearly wrong on that point, too, but it means that I'll have to argue 2 different points at once. :) Also, if we do this, we have to be sure to stress in the PEP/ST docs that math must go in literals like: 'x*y>z'. (Of course, we'll probably want to stress that anyway). Are there any objections in principle for using <...> to delimit URLs? (Other than that it will be hard to convince STNG people). If not, I think we should start trying to convince STNG people to use <...> for URLs, and to give up on ignoring <...> tokens. -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 17:48:02 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 12:48:02 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Mon, 19 Mar 2001 10:27:57 GMT." <00d501c0b05f$443acda0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103211748.f2LHm3p05195@gradient.cis.upenn.edu> > Paragraph (not distinguished normally from the other sorts, which *also* > have special names). If I had to distinguish this, I'd probably call it > a "paragraph with a blank line before it" (remember, that *might* > include the other sorts of thing, too). I think it may be useful to distinguish this (and "paragraph with a blank line before it" is definitely *not* what you want, since it leaves out what I would call a paragraph at the beginning of a document, and it could potentially include any other basic block, if it happens to have a blank line before it (which is required for headings, etc.)). > > * basic block = paragraph or list item or heading or label (or > > table?) > > Paragraph (see above) I think this is somewhat misleading/confusing.. But I guess that's up to you to decide.. > > > 3. trailing whitespace is thrown away > > > > Trailing whitespace for the string as a whole? For each basic > > block? For each line? Is this true in literal blocks? > > For each line. True in all places (you can't, in general, see them, so > there we go). > > For literal blocks, newlines are preserved, but I can't see any obvious > point in preserving trailing spaces. I guess that seems reasonable. Within paragraphs, do you collapse multiple spaces into one space? > > Agreed. Although how do you put something at zero indentation? > > Maybe indent from 1 space over from the preceeding paragraph? > > You don't. I've never wanted to (my problems with HTML normally come > from trying to do the opposite). Hm.. I'm not sure I agree with this, but I don't think it's important enough to get hung up on. (I would argue that you should be able to put things in column 0, but that the HTML renderer should probably indent preformatted regions relative to everything else). > > > > "the following is not a url": > > That's right. In this instance. So does it get rendered as is (i.e., with two quote signs, one colon sign, a less than sign, and a greater than sign)? > I can't see, in docutils (STminus is another kettle of fish) that error > detection (apart from paragraph indentation and paragraph label > detection) is other than a bunch of heuristics, almost certainly one or > more REs, that point out *possible* problems to a user wanting > validation. So it becomes a matter of identifying the set of REs we want > to warn about. As I (think I) said earlier, it should be possible to do error detection in a principled way, given a formal definition of ST. We should be able to print out *all* problems, not just *possible* problems, if the user really wants us to. This seems very important to me if we want to allow for the possibility of competing implementations of ST. > > > > Do *quotes "have to* nest" properly with coloring? > > > > But from the point of view of formalizing things, I have two > > choices here: > > 1. say that it contains a bold region, and the quotes are just > > rendered as quotes > > 2. say that it's undefined (i.e., an invalid string). > > Undefined isn't invalid - it's undefined. At least to me, even in a > formal context, that's true (i.e., not "I don't know" but "I shan't > decide"). I'm calling undefined a subset of invalid. (invalid=illegal+undefined). > On the other hand, once I'm sure I've got the order of > markup/colourising correct, I'll be happy to regard it as so, and then > you could "freeze" it. But is that a good approach? The markup-nesting problem doesn't actually seem that difficult to me, in principle. I propose that we allow anything to nest within anything, with the restrictions: 1. nothing can nest inside a literal, inline, or href url 2. nothing can nest within itself (even with intervening levels) So the legal nestings are shown in this tree: * literal * inline * emph * literal * inline * strong * literal * inline * href name * inline * literal * href url * href name * strong * literal * inline * href url * strong * literal * inline * emph * literal * inline * href name * inline * literal * href url * href name * emph * literal * inline * href url * href name * literal * inline * strong * literal * inline * emph * inline * literal * href name * emph * literal * inline * href url Also, spaces must come between * and ** delimiters, so you can't say ***this***. (Footnote markers [like_this] would probably pattern like href urls) -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 18:35:24 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 13:35:24 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Mon, 19 Mar 2001 20:30:31 GMT." Message-ID: <200103211835.f2LIZOp09967@gradient.cis.upenn.edu> [On making docs deal with inheritance] > > 2. For now, recommend that *tools* inherit documentation for a > > method if f.__doc__ == None, and don't inherit if > > f.__doc__ = '' or any other string. > > I know I'm about to vary my tune but ... someone else has been talking > persuasively out-of-band. Rather than borrowing the doc directly off > the parent ... I think the issue of whether to borrow, or point back, etc., should be one for the tools. Which may be a good reason for the language *not* to do anything automatic, like inheriting doc strings. There are similar questions about whether inherited methods should be listed in a separate section or not, etc. But at any rate, we should say that having f.__doc__=None indicates that inheriting docs is acceptable, and f.__doc__='' means that inheriting docs is not acceptable. Of course, all of this will be difficult to do if we're parsing the file instead of loading it as a module; but that's ok. :) -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 18:49:53 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 13:49:53 EST Subject: [Doc-SIG] backslashing In-Reply-To: Your message of "Tue, 20 Mar 2001 23:23:33 GMT." Message-ID: <200103211849.f2LInrp11620@gradient.cis.upenn.edu> > OK, two orthogonal questions about a verbatim fragment: > * inline or block > * python code or `alien text' > giving us four kinds of `verbatim' fragment in doc-strings. So currently, we do have 4, but they're not exactly the 4 you listed. Instead of "python block," we have "python test case," which is used by some an automated testing program. You can use these to show code & its output, but not for exceptions, and a number of other cases. I'm still not sure I like this system, but it seems somewhat reasonable. The syntax for these python test blocks is a paragraph starting with '>>>', and ending at the next blank line. It should include both the input and the output of the commands you run, although no commands should output lines starting with '>>>' or '...'. > The python/alien distinction is important, because a python fragment is > worth the renderer scanning for identifiers it knows about, so may wish > to render as xrefs to pertinent documentation I think we're going to want to be careful not to put xrefs all over in #literal# sections.. E.g., In the description of class Foo, literals that say #Foo# shouldn't link to Foo (which you are already presumably looking at). And If you talk about class #Bar# five times, there shouldn't be 5 xrefs. But we'll leave this for the tools to deal with. > (however, this is the > *only* processing the renderer should be doing to it); Well, that's not currently what's done with the python test blocks.. But that's because we're trying to be compatible with that automated testing program... (will this change if functions/methods get attributes, and test strings move out of doc strings?) > I regard layout-control as a luxury, subordinate to keeping the markup > language simple. I'll agree for now. So no backslashes, and if someone really wants to use "#" in a python literal, or "'" in a literal, then they have to use a separate block. > Out-of-source documentation should provide the means for folk to say > things the way they want and have it displayed the way they want: for > in-code ST docs, simplicity of markup language is *more important* than > making it look nice. It suffices that we ensure that the author can > express the information the reader needs. A few rare cases where this > will merely be ugly are not worth extra complexity. Hm. Mind if I quote that in my PEP? ;) > the send method writes one '#' character to #sock.control# for every > #chunk# bytes it writes to #sock.out#. > albeit I may be marking more things with #...# than I need to (am I ?). It seems to me that if we're going to use #...# for python literals, then we should really use it for them. I see a danger here of people using 'sock.out' if they don't want an xref, and #sock.out# if they do want an xref. I'm not sure that's what we want people to be doing.. But I'm not sure what the best thing to do about it is. -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 18:53:14 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 13:53:14 EST Subject: [Doc-SIG] URLs In-Reply-To: Your message of "Wed, 21 Mar 2001 11:19:20 +0100." <3AB88028.605B88A6@lemburg.com> Message-ID: <200103211853.f2LIrEp12140@gradient.cis.upenn.edu> [On surrounding URLs with delimiters] > Sounds like a good idea, but don't you use angular brackets ? > These are recommended by the URI RFC, in wide use everywhere and > have similar properties... I anticipate problems with selling this to the STNG people. (Although maybe we don't care, because we're already incompatible with them on any string containing <...>). But I'd like to try to convince them that this is a Good Idea, and that not just passing random through is also a Good Idea. So... Where do I go to do my convincing? Do I write a wiki page on the Zope site? Or can I write email somewhere? Anyone else want to help me convince them? :) -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 19:03:24 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 14:03:24 EST Subject: [Doc-SIG] Tokens for labels & endnotes In-Reply-To: Your message of "Wed, 21 Mar 2001 10:32:36 GMT." <001c01c0b1f2$3ee60d80$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103211903.f2LJ3Op13591@gradient.cis.upenn.edu> > I'm assuming we're talking about paragraph labels. Actually, I think we were talking about [endnotes]. But the same questions apply to labels.. > I think we should just go with the English definition of a word, which > means [-A-Za-z], and leave it at that. It is *meant* to look like a > word. Is that too anglo-centric? > I think "keep it simple" is required here - these labels are meant to be > few and simple, so English words seems sensible to me. I would thus vote > against underlines and against digits. It might be that underlines and digits are more applicable for endnotes. Some people might like this [1] or this [noam_chomsky97]. > Also, validation aside, I don't *use* a regular expression - I look for > the right "shape" of paragraph (1 line, colon in it) and check what is > to the left of the colon against the dictionary. From *my* point of view > the legitimate characters idea only comes in with a validation phase (of > course, it would be different for Edward). This may be different if you want [this to not be an endnote]. > > Basically re defines '\w' = '[0-9a-zA-Z_] > > Erm - basically it doesn't - it invokes "locales" which makes life more > complex (and I have no idea what sre does about '\w'). If LOCALE and UNICODE flags aren't used when compiling a regexp, \w = [a-zA-Z0-9_] (at least according to "the python library reference manual for re":). Furthermore, it will always match '_', regardless of LOCALE and UNICODE (again, according to the ref. manual). -Edward From edloper@gradient.cis.upenn.edu Wed Mar 21 19:26:13 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 21 Mar 2001 14:26:13 EST Subject: [Doc-SIG] What docs should be in the source file? In-Reply-To: Your message of "Wed, 21 Mar 2001 09:14:56 PST." Message-ID: <200103211926.f2LJQDp15504@gradient.cis.upenn.edu> > I think it's relevant. The question is "how far is the user of a > module from *some* information on how to use the module?" Doesn't > matter if they don't have every article that anyone has ever > written about the module -- do they have a starting point? We all agree that *some* information on how to use the module should be included.. But the question is *what types* of information to include? I think that the in-code documentation should tend to be concise, short, and technical. It should be light on examples, and stick to defining individual elements, with a short overview to describe how the elements fit together. > If changing the code changes the behaviour of the module so that > your examples don't work any more, then yeah, you'd better edit > the examples. (Scenario: i've changed a method name from foo() to > spam(), so in my editor i search for ".foo(" to do replacement...) Often, the consequences of changes on the tutorial-type docs is not obvious. Having to think (hard) about whether my changes affect the tutorials/howtos/FAQs/etc. every time I make a change seems unreasonable. On the other hand, API documentation is (by definition) local -- it should be obvious what parts of the documentation need to be updated when I change the code. I guess it just seems like, if I were to have tutorials, etc. in the code myself, I would not be likely to check them every time I make a change. And I think that there are many programmers out there who are lazier than I am. > It's also harder for me to change foo() to spam() in just the code, > check in just that part, and say "oh, i'll change the docs later" -- > because i'll be checking in a single file that's inconsistent with > itself. I have a suspicion that laziness will win out, and people will just say "whatever".. If the docs are in a different file, I can do a CVS diff to see what's changed in the code since the last time I updated the docs, and thus can do updates to the documentation "in batch." -Edward From tavis@calrudd.com Wed Mar 21 20:09:37 2001 From: tavis@calrudd.com (Tavis Rudd) Date: Wed, 21 Mar 2001 12:09:37 -0800 Subject: [Doc-SIG] What docs should be in the source file? In-Reply-To: <200103211926.f2LJQDp15504@gradient.cis.upenn.edu> References: <200103211926.f2LJQDp15504@gradient.cis.upenn.edu> Message-ID: <01032112093702.04676@lucy> I side with Ping on this one. Just because there's some extra examples and how-to-use-me documentation in a source file doesn't mean that you're obliged to update it every time you work on the source. Ping's argument is that it is more likely that lazy programmers will update that type of information if it's stored in the same file. If this argument wins out there wouldn't be much extra to put in the modules anyway, as Python's current 'Library Reference' docs are largely a rehash of the API information. By the way, Edward, Tibs, and Eddy, the energy you guys are putting into ST this month is impressive. You've almost doubled the activity of the previous most active month in DOC-SIG. From Edward Welbourne Wed Mar 21 20:21:19 2001 From: Edward Welbourne (Edward Welbourne) Date: Wed, 21 Mar 2001 20:21:19 +0000 (GMT) Subject: [Doc-SIG] backslashing In-Reply-To: <200103211849.f2LInrp11620@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103211849.f2LInrp11620@gradient.cis.upenn.edu> Message-ID: > Hm. Mind if I quote that in my PEP? ;) coo - flattery ;^> Be my guest. > Instead of "python block," we have "python test case," Hmm. To me that's a different (complementary) thing from what I want the python block to be. A test case should be written to actually run for real; a python block should just be illustrating use of the code and might, indeed, be deliberately broken, e.g. as an accompaniment to the explanation of why something is done the slightly odd way it is, so that maintainers will realise what would go horribly wrong if hey made the obvious `improvement'. Equally, plenty of the tools I write are intended to be used from within the implementations of other tools; having a test system `run' the illustrations I'd want to supply is pointless - e.g.:: class selfRepresenting: def _emit(self, *bits): """Representation support method. ... for example: def __repr__(self): return self._emit(`self.state`, 'name=' + `self.name`) """ return (_fullname(self.__class__) + '(' + string.joinfields(bits, ', ') + ')') in which the illustration only gives the __repr__ method of a class implicitly inheriting from selfRepresenting. To turn it into a workable test which actually tests anything, you'd have to write such a class, provide .state and .name attributes for its members, instanciate this class and (possibly implicitly) call repr() on the resulting object. If you treat it, as given, as a test, all you'll do is verify that the illustrative code gets past the python parser. [Albeit finding that fragment involved trawling through a lot of my code, noticing that for the most part I do illustrations by saying `see class foo, below' or similar; and the above class is purely a sketch I don't think I use.] > In the description of class Foo, literals that say #Foo# shouldn't > link to Foo (which you are already presumably looking at). And If you > talk about class #Bar# five times, there shouldn't be 5 xrefs. But > we'll leave this for the tools to deal with. Yes, that's a tool issue: and my guess is that tool authors will agree with you - Ping, what do you do ? Tool authors should be *at liberty* to do xrefs from the contents of python literals, or (without touching the literals) to xref, in a `see also section', every identifier seen in a literal, or to ... mutatis mutandis. > I'll agree for now. So no backslashes Thank You. Now I can get some sleep at last ... Eddy. From Edward Welbourne Wed Mar 21 20:39:30 2001 From: Edward Welbourne (Edward Welbourne) Date: Wed, 21 Mar 2001 20:39:30 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <001c01c0b1f2$3ee60d80$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <001c01c0b1f2$3ee60d80$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: > I think "keep it simple" is required here to me that needs to include: * case insensitive * digits because authors of doc-strings are going to be shocked if it behaves otherwise. The former means your dictionary-based approach is not satisfactory - string.tolower the apparent label, then check to see whether the result appears in some list (or other implementation of `collection') of known labels. Otherwise, your builder.label_dict is going to need further entries for, at least: "Pep":"pep", "Post-history":"post-history", "Discussions-to":"discussions-to", since some folk using the keys you gave *will* use them in the forms shown; and you'll probably also need "Discussions-TO":"discussions-to", etc. Simpler: use tolower. Have canonical forms generally be in Capitalised-Word form (like RFC 822 labels). Indeed, a good way to implement the aforementioned `collection' would indeed be a mapping which is exactly the reverse of the ones you showed us - mapping from the tolower form to the canonical form for each key - so that one recognises a key using: try: canon = labels[string.tolower(text)] except KeyError: ... # it isn't a real label I am entirely happy to have the present *actual dialects* of ST use only letters and dash; however, allow ST-generic to permit numbers, e.g. so that ST variants *can* use "rfc2954-char-set": "RFC2954-Char-Set" in their label dicts, or similar. (No, I have no idea what RFC 2984 is, nor even whether it exists.) >> Basically re defines '\w' = '[0-9a-zA-Z_] > Erm - basically it doesn't - it invokes "locales" which makes life more > complex (and I have no idea what sre does about '\w'). and I can't say I care much either way, once you're allowing - in the label. The only need for _ is to separate words, and - is easier to type ;*> Eddy. From tony@lsl.co.uk Thu Mar 22 10:10:11 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 10:10:11 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103211702.f2LH23p02270@gradient.cis.upenn.edu> Message-ID: <001401c0b2b8$47d23920$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > (I should be done with STminus002 relatively soon). Good. As you said to Ka-Ping Yee elsewhere, a simple and a complex choice for ST variants is a good choice to have (although I would add, of course I would, that the complexity is either inherited from STClassic, or asked for in the past iterations of this SIG). > > 3. Local references (which look like '[this]' or '[1]') are now > > supported. The "anchor" for a local reference must be at > the start of a > > paragraph (in future releases I would expect it to *start* a new > > paragraph if at the start of a line), and looks like:: > > > > ..[this] > > So... are anchors always hrefs? Or can they be generic footnotes? Or > references for a references section? How should we deal with these > when we're using something other than HTML (e.g., LaTeX) to render > the string? If anchors can be footnotes or references, how does the > renderer decide what to do with them? Erm - no, in HTML terms, anchors are names. The obvious HTML translation of the DOM tree for a local reference and anchor is:: Some text containing a local reference to [this]. [this] is the anchor. In the DOM tree, I have to decide what to put into the "reference", and at the moment I follow HTML/XML conventions and store what you see - that is, the reference element has an attribute whose content is the string '#this'. I use the same attribute name as I use for other links. The advantage of this is twofold - it means we have only one way of linking within the document (which will map easily to both HTML and to XLinks, although we are only using the simplest subset of XLinks!), and it means a user can regard:: [This] is a local reference and:: "This":#this is a local reference as the same, which isn't much use *within* a document, but is *very* useful for allowing links from outside. As to using HTML/XML type links - well, we already had to choose URLs for our external links (or think of it as using simple XLinks if that makes you happier) - this makes consistent sense if we are using a DOM tree to represent our document, anyway. It makes sense to continue this for local references. A tool like TeX would need some untangling of the '#this' to just 'this' for use in its '\xref', but that's hardly difficult. > I'll add this too. BTW, how are you currently handling > things like this:: > > 1. some text > > some more text The list item is at indentation N, the next paragraph at indentation N+3, so that is a list item paragraph and its first child. The "flattening" phase will note that the first item is a list item and the second a paragraph (tags "oitem" and "para"), and bring the paragraph up to be a sibling of the list item. In summary, the initial internal structure is:: and this gets "flattened" to be:: which then gets translated into the DOM tree as elements with those tags (both will, of course, be children of a surrounding '' element). (if we had:: This is a paragraph. And so is this. then the flattening phase would say to itself "aha - a paragraph within a paragraph - presumably the user *meant* something by that", and in this case it would produce:: (clearly we don't regard a paragraph inside a paragraph as being very meaningful in any real sense, but it seems a pity to waste the indentation that the user put in so carefully, and this is the obvious meaning to take). In an HTML rendering, I would expect 'block' to become 'blockquote'.) > > 5. The RE used for detecting URLs has become more > > sophisticated. There are some associated rules > > Hm.. I don't look forward to formalizing this, and trying to get STNG > to agree with your regexps :) STNG has its own REs. They don't make much sense to me (or didn't last time I looked at them). In some cases, they just didn't work very well. Oh well. But I don't see why *formalising* it is a problem? > Note also that it should be possible to generate the "long RE > expression" in a *principled* way, given a formalization, so that > it will detect *all* errors, not just *common* errors. This I don't understand - I'm not sure what you mean by "in a principled way", and I'm also not sure what you mean by "all errors, not just common errors". But this will doubtless become clearer to me as STminus progresses (I begin to suspect you may regret that name some day, as it becomes more capable and more clearly sufficient-to-itself). > Ok, in the formalization system I set up, I divided everything into > "valid" and "undefined". I see a good argument for further dividing > "undefined," though.. So I'll redefine my terms, as such: > > valid -- The string has a unique, predictable result. this is the > same result that it will have in all future versions. > invalid -- The string does not have a unique, predictable result > illegal -- The string will never have a unique, > predictable result > undefined -- The string does not currently have a unique, > predictable result, but it may in a future version. > > Is that acceptable terminology? (I'll try to remember to stick to > it) I'm not sure I'd bother to separate the middle two ("never" is a big concept, and four is somehow more uncomfortable with three), but otherwise I'd be happy to go with those... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 10:15:30 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 10:15:30 -0000 Subject: [Doc-SIG] Local References In-Reply-To: <200103211709.f2LH9Dp02661@gradient.cis.upenn.edu> Message-ID: <001501c0b2b9$0607eb10$f05aa8c0@lslp7o.int.lsl.co.uk> > Clarification on the syntax.. is *anything* that looks like [this] a > local reference, or does it have to be preceeded by "a parenthetical > like"[this] or "a parenthetical and a colon like":[this]? Anything that looks like [this] is a local reference. It should be rendered exactly as it looks in the ST text (i.e., as I said in another message, in HTML it might be:: [this] > What happens if the referent is missing? If validation is on, the user gets a warning (i.e., the implementation is expected to be able to detect this case). docutils doesn't do this yet, but it clearly should. > What is acceptable content for [this]? '[\w_-]+'? Acceptable contents is:: [-_A-Za-z] [-_A-Za-z0-9]* | [0-9]+ (i.e., there are two legitimate forms - the first is a a traditional "identifier", and the second is a simple integer - this latter is included because it is a common form in text as people write it - cf: PEPs. Looking at it, I'm not sure I should allow a hyphen as the first character of the "identifier" form - that may be a typo.) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 10:29:47 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 10:29:47 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103211719.f2LHJjp03408@gradient.cis.upenn.edu> Message-ID: <001601c0b2bb$04ed8f80$f05aa8c0@lslp7o.int.lsl.co.uk> > > Well, prepare to be well miffed (ST has never supported differing > > starting and ending quotes). So hey. > > Although now we have [...] (or "..."[...] or "...":[...] or whatever > it really is). Oh, OK. Point taken (I'd already sensed you don't like those, but they seem to me so uncontentious). > Of course, you can't reasonably forbid '#' in URLs, so you'll have > to put URL recognition before inline recognition *anyway*.. :) No, that's not a problem. A '#' in a URL cannot be a starting or ending quote, at least not if I have the URL RE right, because it won't meet the correct conditions (for instance, it can't have a space before or after it). A little messy, but that's how ST works. > I think there's a serious problem here if we are allowing URLs to > appear in arbitrary places. For example, consider:: > > foo://no#good bar://parse#for this. > > It seems perfectly reasonable for #good bar...# to be a literal.. No, that case is unambiguous. The '#' characters are parts of URLs - they definitely are not quotes. The RE for Python literals (wait a moment whilst I reconstruct the bits together) is:: (?P ^ | [ \n] ) \# (?P [^\#\n]+ ) \# (?= [ \n] | $ | [).,;:!?"] ) So the first '#' isn't the start of a literal, because it isn't preceded by , nor is it the end of a previous literal, because it isn't followed by space or punctuation (broadly speeking). The second '#' isn't the end of a literal, because it isn't followed by space or punctuation, nor is it the start of a literal. It is *just* possible that I need to worry about the terminating punctuation and the contents of a URL (and maybe it shouldn't be #_endpunc#, but #_safe_endpunc# in the Python literal URL - that would enforce more context after the punctuation, for safety). Now, I have a sneaky feeling that you don't like that sort of approach, but so far as I can tell it fits *exactly* the "philosophy" of ST, which is to make what the user types, in general, come out as they would naively expect - I *think* a naive user would expect the above not to be doing quoting. > 1. Say that the opening '#' must have whitespace to its left, > and the closing '#' must have whitespace to its right. Of > course, that forbids saying things like #Object#s, but I > guess I could live with that As you can see, that's the approach taken - and it is taken that way to be as near identical to the way that single quote literals work as possible (which also have that same problem). > 2. Use some special demarkation for URLs! :) I'm for this, > but am worried about trying to convince the STNG people, > esp. if we're proposing using <..>.. Since they're currently > saying that such things should be ignored. Of course, they're > clearly wrong on that point, too, but it means that I'll have > to argue 2 different points at once. :) Also, if we do this, > we have to be sure to stress in the PEP/ST docs that math > must go in literals like: 'x*y>z'. (Of course, we'll probably > want to stress that anyway). I would also like to delimit URLs - it would make life so much simpler. But I also suspect the STNG people won't agree (of course, we might both be wrong!). I still don't see why 'x*y>z' *has* to go in literals, though - clearly by the current and possible future rules it would work (if we do introduce quoting characters for URLs, I would want to insist they act with the same sort of rules as literals and Python literals, so that maths would be no problem). > Are there any objections in principle for using <...> to delimit > URLs? (Other than that it will be hard to convince STNG people). > If not, I think we should start trying to convince STNG people to > use <...> for URLs, and to give up on ignoring <...> tokens. I think it would be a marvellous idea - people already are used to it in emails, and it makes life simpler all round. Yes, by all means open talks on this matter on the STNG arena. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From mal@lemburg.com Thu Mar 22 10:31:33 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 22 Mar 2001 11:31:33 +0100 Subject: [Doc-SIG] URLs References: <200103211853.f2LIrEp12140@gradient.cis.upenn.edu> Message-ID: <3AB9D485.A8FB6F35@lemburg.com> "Edward D. Loper" wrote: > > [On surrounding URLs with delimiters] > > Sounds like a good idea, but don't you use angular brackets ? > > These are recommended by the URI RFC, in wide use everywhere and > > have similar properties... > > I anticipate problems with selling this to the STNG people. (Although > maybe we don't care, because we're already incompatible with them > on any string containing <...>). > > But I'd like to try to convince them that this is a Good Idea, and > that not just passing random through is also a Good Idea. If that's what they want to do, they can use the scheme delimiter (:) in URLs to make a separation between HTML-Tags and URLs. AFAIK, the colon is not allowed in HTML-Tagnames (XML is different due to the namespace notation). > So... Where do I go to do my convincing? Do I write a wiki page > on the Zope site? Or can I write email somewhere? Anyone else > want to help me convince them? :) Sending in patches usually helps ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From tony@lsl.co.uk Thu Mar 22 10:40:20 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 10:40:20 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103211748.f2LHm3p05195@gradient.cis.upenn.edu> Message-ID: <001701c0b2bc$7e2f03a0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > "paragraph with a blank line before it" is definitely *not* > what you want, since it leaves out what I would call a paragraph > at the beginning of a document, and > it could potentially include any other basic block, if it happens to > have a blank line before it (which is required for headings, etc.)). Actually, I agree with you - it's a messy term, anyway. > > > * basic block = paragraph or list item or heading or label (or > > > table?) > > > > Paragraph (see above) > > I think this is somewhat misleading/confusing.. But I guess that's > up to you to decide.. Again, I agree - it's an artefact of the internals. What about "text block"? > I guess that seems reasonable. Within paragraphs, do you collapse > multiple spaces into one space? No, within lines spaces are (very carefully) left untouched, just in case. > > > Agreed. Although how do you put something at zero indentation? > > > Maybe indent from 1 space over from the preceeding paragraph? > > > > You don't. I've never wanted to (my problems with HTML normally come > > from trying to do the opposite). > > Hm.. I'm not sure I agree with this, but I don't think it's important > enough to get hung up on. (I would argue that you should be able > to put things in column 0, but that the HTML renderer should probably > indent preformatted regions relative to everything else). I couldn't see an easy and natural way around it, and I find it hard to conceive of places where *I* would not want to indent, so I gave up (the problem was actually thinking how to decide on an indentation at all, and I was quite pleased with how predictable and useful using the indentation relative to the "parent" or preceding paragraph was). > > > > > "the following is not a url": > > > > That's right. In this instance. > > So does it get rendered as is (i.e., with two quote signs, one colon > sign, a less than sign, and a greater than sign)? That's up to the renderer. But seriously, it gets *stored* as a node of the DOM tree which has the text within quotes (i.e., the quotes are not preserved) as its text, and the URL as its 'url' attribute. Thus the ST markup (the double quotes and the colon) are not remembered. > We should be able to print out *all* problems, not just *possible* > problems, if the user really wants us to. This seems very important > to me if we want to allow for the possibility of competing >implementations of ST. I don't have a problem with telling the user what is wrong with a text, I just don't understand how to quantify that. Of course, in STminus, you have a different handle on things, but that's because you're deciding up front what is allowable and what is not. A "more traditional" ST approach doesn't know that. But being able to give the user as many warnings of problems as possible has got to be a good thing... > The markup-nesting problem doesn't actually seem that difficult to me, > in principle. I propose that we allow anything to nest > within anything, > with the restrictions: > 1. nothing can nest inside a literal, inline, or href url Agreed. But please don't call it an 'href url' - that's an HTML term! > 2. nothing can nest within itself (even with intervening levels) Pragmatically has to be true, with non-differentiated start and end quotes. These two seem to me to be the sane minimum, and thus sensible. > So the legal nestings are shown in this tree: > > * literal > * inline > * emph > * literal OK, OK, I believe you! > Also, spaces must come between * and ** delimiters, so you > can't say ***this***. Ah, but there's no reason you shouldn't be able to *say **this***, for instance (it's quite unambiguous). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 10:47:57 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 10:47:57 -0000 Subject: [Doc-SIG] backslashing In-Reply-To: <200103211849.f2LInrp11620@gradient.cis.upenn.edu> Message-ID: <001801c0b2bd$8e347720$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > So currently, we do have 4, but they're not exactly the 4 you > listed. Instead of "python block," we have "python test case," > which is used by some an automated testing program. You can > use these to show code & its output, but not for exceptions, No, exceptions do work now. > and a number of other cases. I'm still not sure I like this > system, but it seems somewhat reasonable. Given the attraction of doctest, it seemed sensible to allow its code blocks to be treated as such. And given its now in the standard Python package, we'd do well not to ignore it. > The syntax for these python test blocks is a paragraph starting > with '>>>', and ending at the next blank line. It should include > both the input and the output of the commands you run, although > no commands should output lines starting with '>>>' or '...'. Well, that's not up to STpy to say - that's only true if you want to run doctest on it (and may not be the exact rules required there either). > I think we're going to want to be careful not to put xrefs all over > in #literal# sections.. E.g., In the description of class Foo, > literals that say #Foo# shouldn't link to Foo (which you are already > presumably looking at). And If you talk about class #Bar# five times, > there shouldn't be 5 xrefs. But we'll leave this for the tools to > deal with. Agreed. > Well, that's not currently what's done with the python test blocks.. > But that's because we're trying to be compatible with that automated > testing program... (will this change if functions/methods get > attributes, and test strings move out of doc strings?) One of the points about doctest is that when you are describing how something works (i.e., this *will* be in a docstring) it is useful to *show* how it works (or, of course, doesn't). And if you're doing that, then it makes sense to check that what is typed is correct (now who could deny that). And if you're doing that, why, you're testing! So whilst doctest does support "out of line" test strings, it will always be the case that it will run on docstrings as well - by original intent. I suggest you read the relevent chapter in the 2.1 documentation. > It seems to me that if we're going to use #...# for python literals, > then we should really use it for them. I see a danger here of people > using 'sock.out' if they don't want an xref, and #sock.out# if they > do want an xref. I'm not sure that's what we want people to > be doing.. > But I'm not sure what the best thing to do about it is. The case for how cross-referencing of Python quantities is done is not yet decided - it hasn't been discussed (again) yet. One of the ways of getting useful input to it will be to see what happens if (intelligent) guessing is done by a tool. It *may* be that we want to indicate which '#..#' things are to be cross-referenced, and there are on-board suggestions for this (the obvious one is to use '[..]' to indicate the desire - yuck - and another idea is to use something like '^#..#'). But I see this as being a *later* discussion. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 10:50:41 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 10:50:41 -0000 Subject: [Doc-SIG] URLs In-Reply-To: <200103211853.f2LIrEp12140@gradient.cis.upenn.edu> Message-ID: <001901c0b2bd$f04b6590$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > But I'd like to try to convince them that this is a Good Idea, and > that not just passing random through is also a Good Idea. > > So... Where do I go to do my convincing? Do I write a wiki page > on the Zope site? Or can I write email somewhere? Anyone else > want to help me convince them? :) I believe the correct way to do this is to put a new page under the "suggestions for things to do to STNG" page, and wait... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 10:54:00 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 10:54:00 -0000 Subject: [Doc-SIG] Tokens for labels & endnotes In-Reply-To: <200103211903.f2LJ3Op13591@gradient.cis.upenn.edu> Message-ID: <001a01c0b2be$67095890$f05aa8c0@lslp7o.int.lsl.co.uk> > > I'm assuming we're talking about paragraph labels. > Actually, I think we were talking about [endnotes]. But the same > questions apply to labels.. Erm, maybe (sorry I lost the thread) > > I think we should just go with the English definition of a > word, which > > means [-A-Za-z], and leave it at that. It is *meant* to look like a > > word. > > Is that too anglo-centric? Yes. And it will need to be fixed, but not in the first release. (this is a general point about docutils, and at the moment STpy as well, and I think it needs more input from other people at a later stage) > It might be that underlines and digits are more applicable for > endnotes. Some people might like this [1] or this [noam_chomsky97]. For labels I want to exclude '-_', but yes, for labels I want to include them. > If LOCALE and UNICODE flags aren't used when compiling a regexp, > \w = [a-zA-Z0-9_] (at least according to "the python library > reference manual > for re":). > Furthermore, it will always match '_', regardless of LOCALE and > UNICODE (again, according to the ref. manual). My rather desparate hope (not having read the RE section in the new 2.1 manuals yet) is that using REs will give good leverage on the problem mentioned at the top of the email, at which point it *does* become useful to use '\w'. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From mal@lemburg.com Thu Mar 22 10:54:09 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 22 Mar 2001 11:54:09 +0100 Subject: [Doc-SIG] What docs should be in the source file? References: <200103211926.f2LJQDp15504@gradient.cis.upenn.edu> Message-ID: <3AB9D9D1.9FD986A5@lemburg.com> Just to drop in an opinion on the subject: I think almost all API related documentation should go into the source file. Concepts, graphics and other things can be kept in different files, e.g. Word files, but the API should be completely defined in the source file. This is what I was targetting with PEP 224 (attribute docstrings), but which will not happen... maybe Ping has an alternative which will let me document attributes too ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From tony@lsl.co.uk Thu Mar 22 11:04:06 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 11:04:06 -0000 Subject: [Doc-SIG] backslashing In-Reply-To: Message-ID: <001b01c0b2bf$d02b7320$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Welbourne wrote: > A test case should be written to actually run > for real; a python block should just be illustrating use of > the code and > might, indeed, be deliberately broken, e.g. as an accompaniment to the > explanation of why something is done the slightly odd way it > is, so that > maintainers will realise what would go horribly wrong if hey made the > obvious `improvement'. You really should read the doctest documentation (see the chapter in the 2.1 docs for the best intro) - it *will* test broken examples as well > Equally, plenty of the tools I write are intended to be used > from within > the implementations of other tools; having a test system `run' the > illustrations I'd want to supply is pointless - e.g.:: > > class selfRepresenting: > def _emit(self, *bits): > """Representation support method. > ... > for example: > > def __repr__(self): > return self._emit(`self.state`, > 'name=' + `self.name`) > """ > return (_fullname(self.__class__) + '(' + > string.joinfields(bits, ', ') + ')') > > in which the illustration only gives the __repr__ method of a class > implicitly inheriting from selfRepresenting. But as you've presented it, that wouldn't naturally be presented as an interatice session at all - one wouldn't write it as:: for example: >>> def __repr__(self): and so on but rather as:: for example:: def __repr__(self): and so on That's *why* the chosen "start of Python paragraph" thing is '>>>' - because it *is* what it looks like. Anwyay, we probably aren't disagreeing - the "job" of the '>>>' paragraph is well delimited (and was introduced as an idea last time round the Doc-SIG loop, of course), and it is not the same as the job of the '::' paragraph. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 11:10:31 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 11:10:31 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Message-ID: <001c01c0b2c0$b547a410$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Welbourne wrote: > > I think "keep it simple" is required here > to me that needs to include: > * case insensitive > * digits Unfortunately, I strongly disagree. I may be *wrong*, but I disagree. > because authors of doc-strings are going to be shocked if it behaves > otherwise. Well, it's how *I'd* expect it to work (and, of course, neither you nor I are exactly representative examples). > The former means your dictionary-based approach is not > satisfactory - string.tolower the apparent label, then check to see > whether the result appears in some list (or other implementation of > `collection') of known labels. > Otherwise, your builder.label_dict is > going to need further entries for, at least: > > "Pep":"pep", > "Post-history":"post-history", > "Discussions-to":"discussions-to", > > since some folk using the keys you gave *will* use them in the forms > shown; and you'll probably also need Erm - PEP 1 is quite clear over what it wants (it doesn't say one can use case variants, so why should one assume one can?) (((I probably won't do it unless more convinced, but of course doing a string.lower *would* be simple (before doing the dictionary lookup).))) > Have canonical forms generally be in Capitalised-Word form > (like RFC 822 > labels). Indeed, a good way to implement the aforementioned > `collection' would indeed be a mapping which is exactly the reverse of > the ones you showed us - mapping from the tolower form to the > canonical > form for each key - so that one recognises a key using: > > try: canon = labels[string.tolower(text)] > except KeyError: ... # it isn't a real label > > I am entirely happy to have the present *actual dialects* of > ST use only > letters and dash; however, allow ST-generic to permit numbers, e.g. so > that ST variants *can* use > "rfc2954-char-set": "RFC2954-Char-Set" > in their label dicts, or similar. > (No, I have no idea what RFC 2984 is, nor even whether it exists.) Hmm. OK - we're looking for compatibilty with emails, he guessed wildly. Unfortunately, there is *no way* that I can see of unambiguously and obviously allowing paragraph labels to *start* a paragraph (it just leads to too many pitfalls - we use colons in english too freely). So trying to parse bare emails with STpy is always going to be a problem. However, the case for allowing digits is there, I suppose. I'll think about it (he begrudged). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 11:15:02 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 11:15:02 -0000 Subject: [Doc-SIG] What docs should be in the source file? In-Reply-To: <3AB9D9D1.9FD986A5@lemburg.com> Message-ID: <001d01c0b2c1$5753dd00$f05aa8c0@lslp7o.int.lsl.co.uk> M.-A. Lemburg wrote: > This is what I was targetting with PEP 224 (attribute docstrings), > but which will not happen... maybe Ping has an alternative which > will let me document attributes too ?! Strangely, whilst I intensely disliked the way PEP 224 was linking strings to "coincidentally" adjacent values (not that I had a better way to do it!), only last night I found myself intensely wishing I had attribute docstrings - they would indeed make documenting things like class variables so much more pleasant - I could just move the text in the comment above the value into some other form and heh presto, documentation **adjacent to the entity documented** *and* user visible. Ho hum. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Thu Mar 22 13:33:10 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 08:33:10 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Thu, 22 Mar 2001 10:10:11 GMT." <001401c0b2b8$47d23920$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103221333.f2MDXBp17465@gradient.cis.upenn.edu> I wanted to make sure my terminology was clear, because it looked like the indentation got messed up somehow. My terms are: * valid * invalid * illegal * undefined I.e., "illegal" and "undefined" are mutually-exclusive, collectively exhaustive *subsets* of "invalid." We would use "illegal" for strings like:: * This ** is * a ** very * bad ** string * We would currently use "undefined" for strings like:: * Nesting is not yet **implemented** * Both are "invalid" for the current version of STpy, i.e., their meaning is undefined. But we *never* intend to give a meaning to the first one. Of course, an implementation can still give it a structure, if the user asks it to.. But "illegal" strings will *never* be defined under STminus. Hm.. I hope that's clearer.. :) -Edward From edloper@gradient.cis.upenn.edu Thu Mar 22 13:35:35 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 08:35:35 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Thu, 22 Mar 2001 10:10:11 GMT." <001401c0b2b8$47d23920$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103221335.f2MDZZp17578@gradient.cis.upenn.edu> > > > 3. Local references (which look like '[this]' or '[1]') are now > > > supported. So, if I understand correctly, this ST:: This is a [test] ..[test] of local references Would be rendered in HTML as::
This is a [test]

[test] of local references
? I'm not sure how you'd render it in LaTeX.. Can anchors appear anywhere in the document? Do they have to be their own paragraphs? Can anchors be treated as footnotes (e.g., by LaTeX)? What can their contents be? E.g., can they contain a list item:: ..[test] * This is a list item ? A list:: ..[test] * Item1 * Item2 ? etc. > it means a user can regard:: > > [This] is a local reference > > and:: > > "This":#this is a local reference > > as the same, which isn't much use *within* a document, but is *very* > useful for allowing links from outside. Are we expecting people to *want* to link into a document from outside? I can't see ever having any use for that when writing API docs... > A tool like TeX would need some untangling of the > '#this' to just 'this' for use in its '\xref', but that's hardly > difficult. Hm.. maybe I just don't know enough LaTeX. :) [about handling multi-paragraph list items] >> 1. some text >> >> some more text > and this gets "flattened" to be:: > > > I would argue that it would be more appropriate to use:: 1. some text some more text Also, what would your "flattening" do with:: 1. some text some more text even more indented > (if we had:: > > This is a paragraph. > > And so is this. > > then the flattening phase would say to itself "aha - a paragraph within > a paragraph - presumably the user *meant* something by that", and in > this case it would produce:: > > > > Can these nest arbitrarily deeply, if they keep indenting? > > > 5. The RE used for detecting URLs has become more > > > sophisticated. There are some associated rules > > > > Hm.. I don't look forward to formalizing this, and trying to get STNG > > to agree with your regexps :) > > STNG has its own REs. They don't make much sense to me (or didn't last > time I looked at them). In some cases, they just didn't work very well. > Oh well. Well, then, we should convince them to change them! :) > But I don't see why *formalising* it is a problem? It's just nice to have formalisms that don't contain big difficult-to-explain regular expressions. It makes the formalism harder to understand. > > Note also that it should be possible to generate the "long RE > > expression" in a *principled* way, given a formalization, so that > > it will detect *all* errors, not just *common* errors. > > This I don't understand - I'm not sure what you mean by "in a principled > way", and I'm also not sure what you mean by "all errors, not just > common errors". > But this will doubtless become clearer to me as STminus progresses (I > begin to suspect you may regret that name some day, as it becomes more > capable and more clearly sufficient-to-itself). Anything whose meaning is not defined by the formalism is invalid. It should be possible for a user to ask a tool to tell them if they use any invalid forms -- that way, they are guaranteed that what they have written will be interpreted as specified by the *formalism*, regardless of which implementation/tool they happen to be using (unless that tool has a bug). And given a formalism, it is possible to detect invalid forms in a principled way. -Edward From edloper@gradient.cis.upenn.edu Thu Mar 22 13:47:42 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 08:47:42 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Thu, 22 Mar 2001 10:29:47 GMT." <001601c0b2bb$04ed8f80$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103221347.f2MDlgp18142@gradient.cis.upenn.edu> > > Of course, you can't reasonably forbid '#' in URLs, so you'll have > > to put URL recognition before inline recognition *anyway*.. :) > > No, that's not a problem. A '#' in a URL cannot be a starting or ending > quote, at least not if I have the URL RE right, because it won't meet > the correct conditions (for instance, it can't have a space before or > after it). A little messy, but that's how ST works. Hrm. Ok, currently STminus says that words can contain #inlines#. I would *really* appreciate it if you could either run my test cases on STpy, or at least read through them.. Because this is one of them, and I'd really like to know where else the test cases disagree with STpy. > Now, I have a sneaky feeling that you don't like that sort of approach, > but so far as I can tell it fits *exactly* the "philosophy" of ST, which > is to make what the user types, in general, come out as they would > naively expect - I *think* a naive user would expect the above not to be > doing quoting. I'm fine with having things come out exactly as the user expects, as long as we can do so safely.. So even if the user expects:: x * y to come out as an x, an asterisk, and a y, I don't think it should (under the formalism), since that's not *safe*. (any emph region later in the paragraph will seriously confuse things). > I would also like to delimit URLs - it would make life so much simpler. > But I also suspect the STNG people won't agree (of course, we might both > be wrong!). Well, I'll start putting together a case to convince them. > I still don't see why 'x*y>z' *has* to go in literals, > though - clearly by the current and possible future rules it would work > (if we do introduce quoting characters for URLs, I would want to insist > they act with the same sort of rules as literals and Python literals, so > that maths would be no problem). Ick. This makes me cringe. :) You might have noticed, but I want STminus to be "safe", in the sense that there should be no unexpected non-local dependencies. Consider your own sentence, if people think they can leave out the apostrophes:: I still don't see why x*y>z *has* to go in literals, Now, we have a bold "y>z ", and a mysterious '*' after has! Clearly not what we want. (When I say 'x*y>z' *has* to go in the literals, I mean it has to in order to be a "valid" string). > I think it would be a marvellous idea - people already are used to it in > emails, and it makes life simpler all round. Yes, by all means open > talks on this matter on the STNG arena. Will do.. -Edward From edloper@gradient.cis.upenn.edu Thu Mar 22 14:01:48 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 09:01:48 EST Subject: [Doc-SIG] URLs In-Reply-To: Your message of "Thu, 22 Mar 2001 11:31:33 +0100." <3AB9D485.A8FB6F35@lemburg.com> Message-ID: <200103221401.f2ME1np18985@gradient.cis.upenn.edu> > If that's what they want to do, they can use the scheme delimiter (:) > in URLs to make a separation between HTML-Tags and URLs. AFAIK, > the colon is not allowed in HTML-Tagnames (XML is different due to the > namespace notation). Ack. Let's not introduce even more notation, if we can help it! :) (besides which, allowing HTML in ST is very un-safe anyway..) > > So... Where do I go to do my convincing? Do I write a wiki page > > on the Zope site? Or can I write email somewhere? Anyone else > > want to help me convince them? :) > > Sending in patches usually helps ;-) Hm.. Actually, this issue is important enough to me that I'd actually be willing to go read all their code & learn it well enough to put in a patch for this. :) Maybe I'll offer that when I suggest the idea. -Edward From tony@lsl.co.uk Thu Mar 22 14:16:11 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 14:16:11 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103221347.f2MDlgp18142@gradient.cis.upenn.edu> Message-ID: <002101c0b2da$a5537a60$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > Hrm. Ok, currently STminus says that words can contain #inlines#. I believe that to be a mistake. I'll email you a copy of my current REs separately (no need to burden the list) - that may clarify some things (but unfortunately not all, as I don't *always* use URLs) > I would *really* appreciate it if you could either run my test cases > on STpy, or at least read through them.. I've meant to get round to this... Anyway, I've saved sttest.py to floppy, and will take it home today. > Ick. This makes me cringe. :) You might have noticed, Well, it was what I meant by my comment about not expecting you to like something! > but I want > STminus to be "safe", in the sense that there should be no unexpected > non-local dependencies. Consider your own sentence, if people think > they can leave out the apostrophes:: > > I still don't see why x*y>z *has* to go in literals, > > Now, we have a bold "y>z ", and a mysterious '*' after has! Clearly > not what we want. (When I say 'x*y>z' *has* to go in the literals, > I mean it has to in order to be a "valid" string). But by the rules of ST (well, at least of STNG when I looked at it, and I'm sure by my interpretation of the Classic rules), no we don't - we have a bold "has" and a normal font "x*y>z" - the asterisk therein doesn't meet the criteria for starting or ending emphasis. The problem, I guess, is that that seems equally clearly to me how it would (and, indeed, should) work. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 22 14:37:42 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 22 Mar 2001 14:37:42 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103221335.f2MDZZp17578@gradient.cis.upenn.edu> Message-ID: <002301c0b2dd$a708eb30$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > So, if I understand correctly, this ST:: > > This is a [test] > ..[test] of local references > > Would be rendered in HTML as:: > >
> This is a [test] >
>
> [test] of local references >
> > ? I'm not sure how you'd render it in LaTeX.. Well, I don't trust all that whitespace after your
tags, but apart from that, yes. LaTeX has its own way of doing things. The best one could do, I believe, without some non-negligible effort, would be something like:: This is a test\xref{test} \label{test} test of local references which would use numbers in place of the '{test}' in the actual document (and the "label" is invisible). To use the names, one would need to write more sophisticated code. Which is why it sometimes isn't too worthwhile worrying about renderers. Avid use of '\setcounter' would probably allow one to do something, but on the whole I would either (a) give up and let LaTeX do what it wants, or (b) use TeX and write something myself. > Can anchors appear anywhere in the document? The original intention was for their use *as* footnotey, reference things at the end. Possibly in a "Reference:" clause. But on the other hand, I don't see why they should actually be so restricted. > Do they have to be their own paragraphs? They have to occur at the start of a paragraph. They are markup, though, not structure. > Can anchors be treated as footnotes (e.g., by LaTeX)? I don't know. Probably a presentation issue. > What can their contents be? E.g., can they contain a list item:: Again, I don't know. I wouldn't be *too* upset if we said "just a simple paragraph". > Are we expecting people to *want* to link into a document from > outside? I can't see ever having any use for that when writing > API docs... I don't have a use for it, myself, directly. > > A tool like TeX would need some untangling of the > > '#this' to just 'this' for use in its '\xref', but that's hardly > > difficult. > > Hm.. maybe I just don't know enough LaTeX. :) Well, actually, I think I was misremembering (see above) - it's a while since I've used LaTeX. > I would argue that it would be more appropriate to use:: > > 1. > some text > some more text > Hmm. My original model for the DOM tree was XHTML, and that is not how that works. Doesn't mean my model is a GOOD one, mind you... > Also, what would your "flattening" do with:: > > 1. some text > > some more text > > even more indented It would, erm, flatten the first (list item contains para), and put a block around the second (para contains para - presumably the user had a reason). Consistency, hobgoblins, etc. > Can these nest arbitrarily deeply, if they keep indenting? As above. > > STNG has its own REs. They don't make much sense to me (or > didn't last > > time I looked at them). In some cases, they just didn't > work very well. > > Oh well. > > Well, then, we should convince them to change them! :) I shan't say no... > > But I don't see why *formalising* it is a problem? > > It's just nice to have formalisms that don't contain big > difficult-to-explain regular expressions. It makes the > formalism harder to understand. Oh, big REs make anything harder to understand! > Anything whose meaning is not defined by the formalism is invalid. It > should be possible for a user to ask a tool to tell them if they use > any invalid forms -- that way, they are guaranteed that what they > have written will be interpreted as specified by the *formalism*, > regardless of which implementation/tool they happen to be > using (unless > that tool has a bug). And given a formalism, it is possible to detect > invalid forms in a principled way. Which again comes round to our difference in viewpoint or something - you want to formalise first, and that leads to knowing which documents are invalid. My approach (in this instance, I hasten to add - not in the general, nonST case) is that the user throws their text at STpy (which in practice means an implementation thereof) and sees if it comes out as they expect, with as many warnings to be given as can be if they wish them. The reason for this approach with docutils is mainly that ST doesn't *have* a formalism, and for me the best way of working out what it's meant to be doing has been to work with an implementation. STminus *will* have a formalism, and it may even be a formalism for STpy - but both of those are new things. Of course, I'm also biassed 'cos the Doc-SIG loop tended to fall over at the "formalising the spec" stage, and STpy/docutils was my attempt to short-circuit that - it doesn't look like it'll happen this time (what is it about 2001? - the types sig is active, catalogs are coming, we've got pydoc and soon an ST of our own) Anyway, must go Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Thu Mar 22 16:20:29 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 11:20:29 EST Subject: [Doc-SIG] Re: docutils REs In-Reply-To: Your message of "Thu, 22 Mar 2001 14:16:13 GMT." <002201c0b2da$a6d93000$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103221620.f2MGKTp29623@gradient.cis.upenn.edu> 01> _descriptive = """\ 02> (?P # start our *item* 03> (?: # an unnamed group 04> [^\n]* # 0..n of anything but newline 05> '[^'\n]+' # a literal string, containing 1 or more chars 06> [^\n]* # 0..n of anything but newline 07> )* # end group 08> 09> | # or 10> 11> [^\n]* # 0..n of anything but newline 12> 13> (?! # negative lookahead for 14> ' # a quote 15> [^']* # 0..n of anything but quote 16> 17> [ ]+ -- [ ]+ # spaces -- spaces 18> 19> [^']* # 0..n of anything but quote 20> ' # a quote 21> ) # end of negative lookahead 22> ) # end of our *item* 23> 24> [ ]+ -- [ ]+ # spaces -- spaces 25> 26> (?P .*) # 0..n of any character 27> """ What are lines 11-21 for? The only cases I can think of that they capture (that 3-7 don't) are dubious cases like:: bad 'apostrophe nesting -- in the key Also, I wanted to make sure you're clear that '^' and '$' match beginning and end of LINE, not of STRING (although the latter is a subset of the former). You might want to use '\Z' and '\A' to be more clear (although '\Z' still bothers me a little because it matches both '' and '\n' (but not '\n\n').. seems like it should just match ''). I don't think that STNG currently requires whitespace before *emph* or **strong** etc... that's why I coded it like I did. But I think that STpy's approach may be more reasonable.. (we should start making a list of proposed changes to STNG, in order to make STpy and STNG more compatible.. Otherwise, STminus will just end up being a big useless mess :) ).. Hm.. I guess s/we/I/ in that last parenthetical. :-/ I haven't decided yet on whether I'm happy about having this concept of "acceptable ending punctuation.." It sort of seems like *all* punctuation should be ok, or *none*.. But I'll think on it some more. (e.g., should it be ok to have a dash after an *emph region*-like this?) -Edward From edloper@gradient.cis.upenn.edu Thu Mar 22 16:54:49 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 11:54:49 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Thu, 22 Mar 2001 10:40:20 GMT." <001701c0b2bc$7e2f03a0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103221654.f2MGsnp03119@gradient.cis.upenn.edu> > > I guess that seems reasonable. Within paragraphs, do you collapse > > multiple spaces into one space? > > No, within lines spaces are (very carefully) left untouched, just in > case. That seems inconsistant with stripping trailing whitespace. I want to make sure that things look "the same" whether you view them in text or in html.. So it seems like if you print out a STpy string in vt100 mode, with a 130-character-wide screen, it should collapse spaces & re-word-wrap, just like HTML/LaTeX would.. Of course, you could say that that's just a presentation issue.. But I think that for consistancy, we should either: 1. Preserve spaces within lines *and* trailing whitespace. Specify that display tools (even plaintext ones) should treat spaces as soft, and should word-wrap. 2. Remove leading/trailing whitespace, and collapse sequences of spaces to single spaces. Specify that display tools should word-wrap. (of course, you don't colapse spaces in literal regions, inline regions, literal blocks, or python test blocks.) (unless, of course, you can give me a good reason why we *should* preserve sequences of spaces. > > > > > > "the following is not a url": > > > > > > That's right. In this instance. > > > > So does it get rendered as is (i.e., with two quote signs, one colon > > sign, a less than sign, and a greater than sign)? > > That's up to the renderer. But seriously, it gets *stored* as a node of > the DOM tree which has the text within quotes (i.e., the quotes are not > preserved) as its text, and the URL as its 'url' attribute. Thus the ST > markup (the double quotes and the colon) are not remembered. But "" doesn't match the url pattern, so presumably it doesn't even get detected by the href-finding-regexp? As I understand it, you can say:: "This is a test": of StructuredText and it will be rendered (in HTML) as:: "This is a test": of StructuredText and not as:: This is a test of StructuredText > > The markup-nesting problem doesn't actually seem that difficult to me, > > in principle. I propose that we allow anything to nest > > within anything, > > with the restrictions: > > 1. nothing can nest inside a literal, inline, or href url > > Agreed. But please don't call it an 'href url' - that's an HTML term! > > > 2. nothing can nest within itself (even with intervening levels) > > Pragmatically has to be true, with non-differentiated start and end > quotes. It doesn't *have* to be true.. In principle we could allow:: *This **is *no* good** for me* But I don't think we should. > These two seem to me to be the sane minimum, and thus sensible. So we'll stick with that for now. > > Also, spaces must come between * and ** delimiters, so you > > can't say ***this***. > > Ah, but there's no reason you shouldn't be able to *say **this***, for > instance (it's quite unambiguous). But I thought that regions had to be ended by valid punctuation or space? Does '*' count as valid punctuation, then? (Of course, I expect your regexps to change signifigantly when you try to do nesting...) But from a more abstract point of view, I think that '***' will end up being too confusing. I don't think it's unreasonable to require that people *say **this** *. At the very least, it seems much easier to read (for those who aren't intimately familiar with ST, i.e., our entire user base :) ) But I guess that if we are to allow it, I think '***' should only be allowed to mean "close both strong and emph" or "open both strong and emph".. So you shouldn't be able to say:: *Too***confusing** to mean:: *Too* **confusing** But just to be clear, I don't think we should allow it at all. :) -Edward From edloper@gradient.cis.upenn.edu Fri Mar 23 00:25:28 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 19:25:28 EST Subject: [Doc-SIG] backslashing In-Reply-To: Your message of "Thu, 22 Mar 2001 10:47:57 GMT." <001801c0b2bd$8e347720$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103230025.f2N0PSp06138@gradient.cis.upenn.edu> Tibs said: > Given the attraction of doctest, it seemed sensible to allow its code > blocks to be treated as such. And given its now in the standard Python > package, we'd do well not to ignore it. Agreed. I think that we should reiterate in our own docs that the test cases in the doc strings should be for illustrative purposes, and that extensive unit testing should be put in __test__. Tibs also said: > So whilst doctest does support "out of line" test strings, it will > always be the case that it will run on docstrings as well - by original > intent. > Well, if you ask it nicely, it won't. But in general it should, and certainly will by default. :) Tibs later said (in a different email): > But as you've presented it, that wouldn't naturally be presented as an > interatice session at all - one wouldn't write it as:: > > for example: > > >>> def __repr__(self): > and so on > > but rather as:: > > for example:: > > def __repr__(self): > and so on But then Eddy still wants to know whether the literal block is python code or not (for some of the same reasons that we want to have separate #...# and '...' forms, instead of just one of them). I don't see encoding this information as essential. But if we *do* want to encode it, we have to have some way of distinguishing python literal blocks from vanilla literal blocks (so we'll have 5 different literalish types: literals; inlines; literal blocks; doctest blocks; and python literal blocks). -Edward From edloper@gradient.cis.upenn.edu Fri Mar 23 00:29:28 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 19:29:28 EST Subject: [Doc-SIG] Tokens for labels & endnotes In-Reply-To: Your message of "Thu, 22 Mar 2001 10:54:00 GMT." <001a01c0b2be$67095890$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103230029.f2N0TTp06311@gradient.cis.upenn.edu> Tibs hurt my poor little brain by saying: > For labels I want to exclude '-_', but yes, for labels I want to include > them. I'll assume that the second "labels" should be "local references" (like [this]). > My rather desparate hope (not having read the RE section in the new 2.1 > manuals yet) is that using REs will give good leverage on the problem > mentioned at the top of the email, at which point it *does* become > useful to use '\w'. Yes, I believe a lot of stuff will just fall out nicely with this. -Edward From edloper@gradient.cis.upenn.edu Fri Mar 23 00:38:08 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 19:38:08 EST Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: Your message of "Thu, 22 Mar 2001 11:10:31 GMT." <001c01c0b2c0$b547a410$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103230038.f2N0c8p06743@gradient.cis.upenn.edu> > Edward Welbourne wrote: > > > I think "keep it simple" is required here > > to me that needs to include: > > * case insensitive > > * digits > > Unfortunately, I strongly disagree. I may be *wrong*, but I disagree. I'll assume we're still talking about labels. I think it's unreasonable to expect people to remember whether they're supposed to type "Author:" or "author:".. Is there a good reason to make it case sensitive? I can't imagine ever defining two *different* labels "Author" and "author"... Convince us that it should be case sensitive. :) -Edward From edloper@gradient.cis.upenn.edu Fri Mar 23 00:45:31 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 19:45:31 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Thu, 22 Mar 2001 14:16:11 GMT." <002101c0b2da$a5537a60$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103230045.f2N0jVp07081@gradient.cis.upenn.edu> > > STminus to be "safe", in the sense that there should be no unexpected > > non-local dependencies. Consider your own sentence, if people think > > they can leave out the apostrophes:: > > > > I still don't see why x*y>z *has* to go in literals, > > > > Now, we have a bold "y>z ", and a mysterious '*' after has! Clearly > > not what we want. (When I say 'x*y>z' *has* to go in the literals, > > I mean it has to in order to be a "valid" string). > > But by the rules of ST (well, at least of STNG when I looked at it, and > I'm sure by my interpretation of the Classic rules), no we don't - we > have a bold "has" and a normal font "x*y>z" - the asterisk therein > doesn't meet the criteria for starting or ending emphasis. The problem, > I guess, is that that seems equally clearly to me how it would (and, > indeed, should) work. Ok, so I was using different rules than you were (I was using STNGs).. So the relevant example would be:: I still don't see why x * y *has* to go in literals. I admit, that's a little more strained. But I still think there's a safety issue here.. (although there is something to be said about having "'" use the same rules as all the other delimiters (or vice versa, I guess)). I'll think on it some more. Somewhat related, do you think we should allow things like:: *two*'regions' Where 2 regions are not separated by space/punctuation? I vote no.. -Edward From edloper@gradient.cis.upenn.edu Fri Mar 23 01:00:32 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 22 Mar 2001 20:00:32 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Thu, 22 Mar 2001 14:37:42 GMT." <002301c0b2dd$a708eb30$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103230100.f2N10Wp07821@gradient.cis.upenn.edu> Tibs said: > > Can anchors appear anywhere in the document? > > The original intention was for their use *as* footnotey, reference > things at the end. Possibly in a "Reference:" clause. But on the other > hand, I don't see why they should actually be so restricted. I vote for only allowing them at the end, because they confuse me otherwise. :) What do other people think? > > Do they have to be their own paragraphs? > > They have to occur at the start of a paragraph. They are markup, though, > not structure. Um. I'm not sure exactly how you're using those two terms. You mean they're what I would call "local formatting" or "coloring" and not "global formatting" or "structring"? If so, I disagree. I think they should be a special type of heading, very similar syntactically to labels. So you can say:: ..[foo] bar or:: ..[foo] para 1 para 2 > > I would argue that it would be more appropriate to use:: > > > > 1. > > some text > > some more text > > > > Hmm. My original model for the DOM tree was XHTML, and that is not how > that works. Doesn't mean my model is a GOOD one, mind you... Hm. I'd rather use a good model. :) But how we convert it to an XML document isn't really a fundamental issue, so I'll just leave it be for now.. > Oh, big REs make anything harder to understand! Yes, but I think that more work needs to go into making formalisms easy to understand than implementations.. > Which again comes round to our difference in viewpoint or something - > you want to formalise first, and that leads to knowing which documents > are invalid. My approach (in this instance, I hasten to add - not in the > general, nonST case) is that the user throws their text at STpy (which > in practice means an implementation thereof) and sees if it comes out as > they expect, with as many warnings to be given as can be if they wish > them. The problem I have with your approach is that it assumes: 1. There is one cannonical tool, or all tools work exactly the same. 2. The tools won't change over time I think that we may be setting ourselves up for annoying problems down the road, in terms of people wanting backwards compatibility so they won't have to rewrite doc strings. Witness how much of a problem backwards compatibility can be for Python in introducing things like nested scopes.. Other people *have* successfully defined (formalized) documentation languages (javadoc, pod), so I don't see why we can't do the same, in principle.. > The reason for this approach with docutils is mainly that ST doesn't > *have* a formalism, and for me the best way of working out what it's > meant to be doing has been to work with an implementation. Which is reasonable. But I think that you should be at least working *towards* a formalism.. > Of course, I'm also biassed 'cos the Doc-SIG loop tended to fall over at > the "formalising the spec" stage, and STpy/docutils was my attempt to > short-circuit that - it doesn't look like it'll happen this time (what > is it about 2001? - the types sig is active, catalogs are coming, we've > got pydoc and soon an ST of our own) Well, hopefully this time we'll manage to stay standing. :) -Edward From Edward Welbourne Thu Mar 22 20:36:19 2001 From: Edward Welbourne (Edward Welbourne) Date: Thu, 22 Mar 2001 20:36:19 +0000 (GMT) Subject: [Doc-SIG] What docs should be in the source file? In-Reply-To: <200103211926.f2LJQDp15504@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103211926.f2LJQDp15504@gradient.cis.upenn.edu> Message-ID: Ping: > What i had in mind was "how to use this module". I'll come back to this: it points to a middle ground ... > As an extreme example, try running "perldoc CGI". CGI.pm contains > about 3200 lines of code followed by 3000 lines of detailed > documentation. While the module itself is indeed enormous, i think > that it is useful to have all of that information about how to use the > CGI module instantly available right there in CGI.pm. having played enough with pod to see that it would be a good tool for the kinds of in-code doc I discuss below, *but no more*, I would argue that pod is actually the perfect illustration of why I *don't* want to go this far (even allowing for CGI.pm to be chopped into little pieces). The docs one gets out of pod are just about OK for showing to techies who only really care about information content and don't mind *too much* if it's rather badly presented - as various remarks in the perlpod man page will make clear, this is intended. Using it for anything more leads to ugly docs (I shalln't be quick to forget the look of *disgust* on the face of a technical author colleague when I asked, yesterday, what to do about exactly some such docs ... I'm glad pod was the focus of that, else I'd have curled up and died). Ping: > ... The question is "how far is the user of a module from *some* > information on how to use the module?" Doesn't matter if they don't > have every article that anyone has ever written about the module -- do > they have a starting point? Edward & Tibs have clearly been devoting *much* effort to consideration of how to embed xrefs in the doc strings (and yes, <...> is the morally correct way to delimit URLs; but I'll come back to that in a separate e-mail), so - at least in principle - the doc string contains xrefs to all the flavours of doc that might exist in connection with the module. Even if it only documents the naked API, its xrefs are a starting point. Ping: > It's also harder for me to change foo() to spam() in just the code, > check in just that part, and say "oh, i'll change the docs later" -- > because i'll be checking in a single file that's inconsistent with > itself. Edward: > If the docs are in a different file, I can do a CVS diff to see what's > changed in the code since the last time I updated the docs, and thus > can do updates to the documentation "in batch." and, when I know several of my colleagues will be changing the same file some more in the present release cycle, batching the doc changes may well be The Right Thing To Do - especially if, as where I work, there's a separate documentation group ... and I trust their idea of what constitutes an intelligible presentation of `how to use this tool' better than I trust most techies, myself included. So TRTTD may well be to send an e-mail to the doc group saying `I changed module foo in this way, I *have* revised the API docs within it, I think you need to change sections 2, 7 and 11 of the refman, along with all references, in all docs, to method fudge() on class Interpolator', rather than messing up their docs (which are likely maintained in some other doc format anyway, precisely because real doc teams don't believe in the sorts of doc-tool that techies think of as the bee's knees). Furthermore, the changes to documentation, even if I draft them before the doc team goes to work, will probably need to be integrated with several other sets of changes made by colleagues whose projects impact the same source file in the same release cycle. I may need to check in my code changes to get the automatic test tools to run the right tests on all platforms (as opposed to the one or two on which I test it myself before freezing) and I may well be checking in a prototype or first draft of my changes in order to find out which irritating platform-variations are going to force me to revise my approach before I can settle the final issues of the design that goes into the release candidate; so it may not be `laziness' that I leave out my changes to the docs - it may be the prudence of `I shall almost certainly be changing this some more, and shall not know for sure how until later' which makes large amounts of efforts on the docs futile. So `oh leave the docs for now' may actually be wise and prudent; and, as Edward says, I (or our doc team) can ask tools where changes to docs are needed. Indeed, in an ideal world, the doc team has taken the design spec I wrote before I began coding and is working on the user-oriented docs at the same time that I'm changing the code. It is generally best, under *any* version-control system, to avoid having two sets of changes proceeding on the same file at the same time; and I'll be reviewing the doc team's work while the doc team review my changes to the API docs, so we do get to catch glitches. Now, back to Ping's > What i had in mind was "how to use this module". and here I'm with Ping, regarding Edward's `Only the API' line as being too purist - or confusingly phrased; I shalln't be surprised if Edward's idea of `Only the API' does include `how to use this API', so I suspect we aren't as far apart as we seem to imagine. So I have half a guess that the following might bring us closer to agreement: The python source code contains doc strings which explain how to use the code; this is expressed in ST and targeted at maintainers and interrogators - i.e. folk who are either looking at the source or playing with an object their python session has given them, whose behaviour they need to know about, ideally without being obliged to look at the implementation (even assuming they have it). Other files contain documentation of other kinds, possibly in other formats; project management and version control can be used to flag which of these will need to be changed when the code changes. The source docs cross-reference these. The source docs *should* suffice to generate (possibly crude) documentation in (at least) man and HTML formats, which should be of a good enough standard to serve as the *start-point* for writing the reference manual; indeed, if one isn't too fussed about the reference manual being beautiful, they should suffice *as* a reference manual. The source doc format *must* be sufficiently straightforward that a maintainer looking at the code *will* read and understand them without suffering eye-pain (on which HTML fails for Guido at least) a maintainer changing the code *will* be able to see what changes to make to the docs and *will not* be put off making those changes by doubts about how to express them an interrogator with a python object `in hand' can (chose their own interrogation tools and, using these) get the object to tell all they need to know to determine what it promises to do (and what it doesn't) the author of potential client code can ask tools to find them which source modules to consider using and can glean enough information from the docs of those modules to make informed (ideally: correct) choices. The maintainer's needs call for simplicity of format, the interrogator's call for richness, albeit with some cross-over both ways; good tools can make a big difference to the richness (e.g. all that stuff about trawling base classes for matching methods, providing default doc strings, etc.). The client-author's needs call for standardisation (hence Tibs' work on labels). Practical experience in the field of software maintenance says unambiguously that simplicity is a very serious issue, especially if one is to have enough standardised semantic markup to ensure that tools can do a good job for the client author. A surfeit of bureaucracy *will* lead to folk changing the code without bothering to keep the source docs in sync (let alone the out-of-source ones). Equally, without suitable standardised markup, client-authors will be unable to find a good supplier of round wheels, so they *will* end up using hexagonal ones `because those are easier to knock together', which will continue to make a mess of the roads. Case in point: regexen for URLs. To meet these needs, the source docs for each method/class/... (call it: object) *do* need to include: * a clear statement of what the object *promises* (and doesn't) * a clear statement of *what it's for* and *how to use it* * references to more sophisticated docs saying everything else for as many values of `everything else' as authors can be found to write. If we ask for more than this, * we'll need such complexity in ST that maintainers won't, so * it won't be realistic to expect the in-code docs to stay in sync with the code they're in, so * we won't have the code separate from the docs that will get out of sync with it, so no-one will know which is right. Note that disagreement between code and some docs won't trigger the `trust neither' rule provided * it's immediately clear to the reader which one (the one in a different file from the implementation) is wrong, and * there are *some* docs with the code which agree with it. Ping: > - Keeping modules and associated docs in the same file helps > to ensure that the two are in sync when you distribute or > edit the file. (It's not possible to have different > versions of the code and the docs at the same time; it's > less likely that someone will check in changes to one > without updating the other, etc.) Edward: > 2 issues: editing and distribution > distribution -- maybe we want to turn modules into packages, and > include docs in the package? There's not a lot of precedent > for this in other languages though.. > editing -- ... (I already addressed editing) The distribution problem defines the boundary quite nicely: reference manuals, how-to guides and tutorials *shouldn't* change when I fix a bug, though the in-code docs might (notably for the internal method which now has to do things slightly differently so that the module actually implements its documented external API). Likewise if I totally re-implement the entire module, but preserve its API; conversely, a perfectly good module may get its reference manual massively overhauled without changing one line of the code. Of course, a real total re-implementation will change the API, but then it'll equally be part of a `new major release' of the module, so re-writing the separate docs shouldn't seem out of place. (Indeed, a total re-write of the module reference manual will typically reveal changes needed in the API.) Furthermore, if you've got the code you need the API and an overview of how to use the module; these need to be in sync with the actual implementation you've got (and to tell you which *version* you've got). However, you'll probably only use a moderate fraction of the actual modules in your python installation, so you probably *don't* want a separate copy of the tutorial and similar `big picture' docs on every machine on which you install your python distribution; you may be happy to live with the xrefs pointing to www.python.org or you may want to have one copy of all the big picture docs on a central server shared by all pythoneers in a given team. [Which points to an issue for the URL discussion; one really does want to be able to specify URLs relative to `the root URL we selected when we installed our python doc system' which *might* be at www.python.org and *might* be on a machine on the team's local network or *might* be local to the actual machine in use; the installation process will doubtless involve verifying that this URL is accessible and *does* provide the relevant docs.] When it comes to the reference manual, there is even a case for deliberately chosing to isolate it from the source - so that, for instance, I can implement a module which will be portable between versions of python. If, in it, I rely on the in-code docs of my locally-installed re module (say) I may well write code which only works for folk using the same version of python as me. The reference manual for module re *should* tell me gotchas about `we changed this between version 1.5.2 and 2.0 of python, so beware' which (IMO) *should not* be present in the in-code docs of the module. So I find myself increasingly confident that TRTTD is to draw a dotted line between the things which belong with the code and the things which do not; that all `big picture' docs belong in separate files; that authors of client code really *do* need to use these `big picture' docs as their primary source (for portability); and that the in-code docs should be limited to the API and an account of its proper usage. The big picture docs then get to be revised when the API changes, or when someone finds the energy to improve them, as a separate process from any changes to the code. Eddy. From tony@lsl.co.uk Fri Mar 23 10:22:59 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 10:22:59 -0000 Subject: [Doc-SIG] Re: docutils REs In-Reply-To: <200103221620.f2MGKTp29623@gradient.cis.upenn.edu> Message-ID: <002a01c0b383$3c18a670$f05aa8c0@lslp7o.int.lsl.co.uk> > 01> _descriptive = """\ > 02> (?P # start our *item* > 03> (?: # an unnamed group > 04> [^\n]* # 0..n of anything but newline > 05> '[^'\n]+' # a literal string, containing 1 or more chars > 06> [^\n]* # 0..n of anything but newline > 07> )* # end group > 08> > 09> | # or > 10> > 11> [^\n]* # 0..n of anything but newline > 12> > 13> (?! # negative lookahead for > 14> ' # a quote > 15> [^']* # 0..n of anything but quote > 16> > 17> [ ]+ -- [ ]+ # spaces -- spaces > 18> > 19> [^']* # 0..n of anything but quote > 20> ' # a quote > 21> ) # end of negative lookahead > 22> ) # end of our *item* > 23> > 24> [ ]+ -- [ ]+ # spaces -- spaces > 25> > 26> (?P .*) # 0..n of any character > 27> """ > > What are lines 11-21 for? The only cases I can think of that > they capture (that 3-7 don't) are dubious cases like:: > > bad 'apostrophe nesting -- in the key I can't offhand remember - the RE growed until it appeared to work, and some of it appeared to rely on the "fuzzy" handline that REs appear (to me) to do in balancing the greediness of different bits of the RE. It's possible it's skeletal remains which should be excised, I suppose. > Also, I wanted to make sure you're clear that '^' and '$' match > beginning and end of LINE, not of STRING (although the latter is > a subset of the former). Not according to the RE documentation in the Python 1.5.2 reference manual, they don't - that's quite clear in saying start and end of STRING, and recognition of newlines is only in MULTILINE mode. > I don't think that STNG currently requires whitespace before > *emph* or **strong** etc... that's why I coded it like I did. I kept STNG REs around as comments for "of interest" reasons, but personally found them less than useful, so basically have worked from scratch and the ST "documentation". So it's quite possible they're different. > But I think that STpy's approach may be more reasonable.. > (we should start making a list of proposed changes to STNG, > in order to make STpy and STNG more compatible.. Otherwise, > STminus will just end up being a big useless mess :) ).. Well, no, I wouldn't say that. > Hm.. I guess s/we/I/ in that last parenthetical. :-/ my preferred option! > I haven't decided yet on whether I'm happy about having this > concept of "acceptable ending punctuation.." It sort of seems > like *all* punctuation should be ok, or *none*.. I'm not *too* happy about it myself, and actually it's a string that's '%' included into the RE texts where it's needed - this means that (a) it's easy to change, but (b) it should be the same in all places - I thought consistency was a Good Idea. > But I'll > think on it some more. (e.g., should it be ok to have a dash after > an *emph region*-like this?) That looks wrong to me - but then you can see how I use dashes in plain text! There *are* some conventions on how one uses punctuation - for instance, 'this ,' looks wrong to almost everyone. ST just enforces some of them (this is, of course, yet another class of things to consider warning people about). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 23 10:34:29 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 10:34:29 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103221654.f2MGsnp03119@gradient.cis.upenn.edu> Message-ID: <002b01c0b384$d7153480$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > > No, within lines spaces are (very carefully) left untouched, just in > > case. > > That seems inconsistant with stripping trailing whitespace.> Actually, I agree, although I hadn't thought of it until you said so > I want > to make sure that things look "the same" whether you view them in > text or in html.. So it seems like if you print out a STpy string > in vt100 mode, with a 130-character-wide screen, it should collapse > spaces & re-word-wrap, just like HTML/LaTeX would.. Of course, you > could say that that's just a presentation issue.. But I think that > for consistancy, we should either: > > 1. Preserve spaces within lines *and* trailing whitespace. > Specify that display tools (even plaintext ones) should treat > spaces as soft, and should word-wrap. I remember when most tools would show trailing whitespace visibly. Those days appear to have gone a long time ago. I'd oppose retaining trailing whitespace. > 2. Remove leading/trailing whitespace, and collapse sequences > of spaces to single spaces. Specify that display tools > should word-wrap. (of course, you don't colapse spaces in > literal regions, inline regions, literal blocks, or python > test blocks.) Word wrapping is a presentation issue - if the renderer is generating etext, or STNG, then it may make sense to *not* word wrap. > (unless, of course, you can give me a good reason why we *should* > preserve sequences of spaces. No. It's only laziness. So far as I can see, it doesn't actually make any different in any circumstance I can see, so there's no point in my bothering to remove the spaces internally - I don't care about them, whereas I *have* to care about leading spaces, and I remove trailing spaces out of kindness (so that '::' works better, for instance - it doesn't suffer from the "oh dear, that backslash didn't continue my Python line because there's an invisible space after it" problem). > > > > > > > "the following is not a url": > > > > > > > > That's right. In this instance. > > > > > > So does it get rendered as is (i.e., with two quote > signs, one colon > > > sign, a less than sign, and a greater than sign)? > > > > That's up to the renderer. > > But "" doesn't match the url pattern, so presumably it doesn't > even get detected by the href-finding-regexp? As I understand it, > you can say:: Sorry - I slipped into "if were a meta-rendition of a URL" mode. You're right, it would be stored "as is". > > > 2. nothing can nest within itself (even with intervening levels) > > > > Pragmatically has to be true, with non-differentiated start and end > > quotes. > > It doesn't *have* to be true.. In principle we could allow:: > > *This **is *no* good** for me* I suppose so, for one definition of how one would parse it (he said grudgingly). > But I don't think we should. Luckily we agree! > > These two seem to me to be the sane minimum, and thus sensible. > > So we'll stick with that for now. > > > > Also, spaces must come between * and ** delimiters, so you > > > can't say ***this***. > > > > Ah, but there's no reason you shouldn't be able to *say > **this***, for > > instance (it's quite unambiguous). > > But I thought that regions had to be ended by valid punctuation or > space? Does '*' count as valid punctuation, then? (Of course, > I expect your regexps to change signifigantly when you try to do > nesting...) Hmm. Well, it works: text: **SS*ee*** --> rendering SS*ee* Hmm. > But from a more abstract point of view, I think that '***' will end > up being too confusing. I don't think it's unreasonable to require > that people *say **this** *. At the very least, it seems much > easier to read (for those who aren't intimately familiar with ST, > i.e., our entire user base :) ) But I've already noted you have a cavalier attitude to extra spaces (pace your HTML) - and I'm not convinced on the "easier to read". Ho hum. > But I guess that if we are to allow it, I think '***' should only > be allowed to mean "close both strong and emph" or "open both > strong and emph".. So you shouldn't be able to say:: > > *Too***confusing** I just tried it - docutils does: text: *ee***SS** --> ee***SS* which seems reasonable enough. > to mean:: > > *Too* **confusing** The latter is clearer, certainly! > But just to be clear, I don't think we should allow it at all. :) I *think* it will all come out in the wash, myself. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 23 10:39:13 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 10:39:13 -0000 Subject: [Doc-SIG] backslashing In-Reply-To: <200103230025.f2N0PSp06138@gradient.cis.upenn.edu> Message-ID: <002c01c0b385$80d1a8a0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > Agreed. I think that we should reiterate in our own docs that > the test cases in the doc strings should be for illustrative > purposes, and that extensive unit testing should be put in > __test__. Well, I think we should say what they *are*, which is that they are strings that represent Python code that doctest will happily find (modulo the place the docstring is) and process - that's *not* the same as "illustrative", which implies "not very important" - the point is that they may well be pedagogic... On the pursuit of extensive unit testing and where strings shoul go, I think we should be silent, and leave it to doctest and the unit test software (whichever it is - pyunit?) to say... > But then Eddy still wants to know whether the literal block is python > code or not (for some of the same reasons that we want to have > separate #...# and '...' forms, instead of just one of them). > > I don't see encoding this information as essential. But if we *do* > want to encode it, we have to have some way of distinguishing > python literal blocks from vanilla literal blocks (so we'll have > 5 different literalish types: literals; inlines; literal blocks; > doctest blocks; and python literal blocks). That way lies madness, 'cos what about C code, oh, and maybe some Haskell is very important, and... I think this is too big a task for ST itself - maybe a later job for @ escapes (ducks and covers). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 23 10:41:49 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 10:41:49 -0000 Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103230038.f2N0c8p06743@gradient.cis.upenn.edu> Message-ID: <002d01c0b385$dd989300$f05aa8c0@lslp7o.int.lsl.co.uk> I wrote: > > Unfortunately, I strongly disagree. I may be *wrong*, but I > disagree. Edward D. Loper wrote: > I'll assume we're still talking about labels. yes > I think it's unreasonable to expect people to remember > whether they're supposed to type "Author:" or "author:".. > Is there a good reason to make it case sensitive? I can't > imagine ever defining two *different* labels "Author" and > "author"... Convince us that it should be case sensitive. :) Unfortunately, I remember too many arguments where I start out disagreeing vehemently and end up having to give in for sake of "reasonableness" (cursed are they who see both sides of the argument...). My argument for case sensitivity is that "it damn well should be like that" (all those careless programmers, mutter, mutter). Which probably means I'll need to give in. Oh well. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 23 10:49:25 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 10:49:25 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103230045.f2N0jVp07081@gradient.cis.upenn.edu> Message-ID: <000001c0b386$ed3dbe10$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > So the relevant example would be:: > > I still don't see why x * y *has* to go in literals. I agree that's a problem paragraph. > Somewhat related, do you think we should allow things like:: > > *two*'regions' > > Where 2 regions are not separated by space/punctuation? I vote > no.. I vote no too (and so does docutils - or at least, it votes not to recognise that text as being marked up). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 23 11:04:04 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 11:04:04 -0000 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103230100.f2N10Wp07821@gradient.cis.upenn.edu> Message-ID: <000101c0b388$f9778e20$f05aa8c0@lslp7o.int.lsl.co.uk> > Tibs said: > > > Can anchors appear anywhere in the document? Edward Loper wrote: > I vote for only allowing them at the end, because they confuse > me otherwise. :) What do other people think? Well, I know David Goodger was trying to invent something *like* these anchors to allow him to navigate round a document. I suspect it's something I'm agnostic on, and would thus not choose to pronounce on... > > > Do they have to be their own paragraphs? > > > > They have to occur at the start of a paragraph. They are > markup, though, not structure. > > Um. I'm not sure exactly how you're using those two terms. You mean > they're what I would call "local formatting" or "coloring" and not > "global formatting" or "structring"? Yes. But I agree it's a dodgy issue distinguishing a paragraph that has an item that can only occur at the start from an element that is "named" by that occurrence. It may well be they *should* be more structurey, if they are required so to act (and that would also make it easier to allow them to start a new paragraph, which we *might* want to do). > > Hmm. My original model for the DOM tree was XHTML, and that > > is not how that works. Doesn't mean my model is a GOOD one, mind you... > > Hm. I'd rather use a good model. :) But how we convert it to an > XML document isn't really a fundamental issue, so I'll just leave > it be for now.. XML isn't, but the choice of DOM is - the reason for choosing DOM was so that an interface between parser and user could be established that was well understood, and could be maniplulated easily with Python available tools. The "magic" behind the DOM creation could then be done by any of a series of tools, provided they all produced the same sort of DOM tree (i.e., use the same or similar DTD). I thus see STpy as mapping directly to a DTD (or XML-Schema, or name your poison). Given DOM, XHTML seems a natural example to choose (although I do find it a bit odd in places). The DOM thing is an important point... (I'll have to make sure the PEP stresses that). > The problem I have with your approach is that it assumes: > 1. There is one cannonical tool, or all tools work exactly > the same. > 2. The tools won't change over time Which is what the DOM thing is partly meant to address - but I see you are talking about a different bit of the "approach". > I think that we may be setting ourselves up for annoying problems > down the road, in terms of people wanting backwards compatibility > so they won't have to rewrite doc strings. Witness how much of a > problem backwards compatibility can be for Python in introducing > things like nested scopes.. > > Other people *have* successfully defined (formalized) documentation > languages (javadoc, pod), so I don't see why we can't do the same, > in principle.. No, our problem is that ST *is* defined, but informally. Retrofitting a formalism onto an informal standard *is* a problem, 'cos people have different understandings of how that informal standard will work. As such, docutils takes the "code it and see how it works" approach (Python as formalism), whilst you're taking the "think about it hard and see what it should do" approach (more traditional formalism). But we're both stuck with the format *essentially* already being defined for us. > > The reason for this approach with docutils is mainly that ST doesn't > > *have* a formalism, and for me the best way of working out what it's > > meant to be doing has been to work with an implementation. > > Which is reasonable. But I think that you should be at least working > *towards* a formalism.. Well, give me a chance to finish writing the documentation! Seriously, STpy *is* defined, but it is done in less formal language than EBNF. docutils has been used to inform me as to what sensible decisions and behaviour might be, but it is not the definition. I assume that someone could take STpy and produce EBNF from it. I hope. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 23 11:24:32 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 11:24:32 -0000 Subject: [Doc-SIG] What docs should be in the source file? In-Reply-To: Message-ID: <000601c0b38b$d55c83d0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Welbourne wrote a long missive on the subject, to which I add: what he said. Being me, I can't refrain from a couple of comments: > The maintainer's needs call for simplicity of format, the > interrogator's call for richness, albeit with some cross-over > both ways; good tools can make a big difference to the richness > (e.g. all that stuff about trawling base classes for matching > methods, providing default doc strings, etc.). The > client-author's needs call for standardisation (hence Tibs' work > on labels). (although I may be working myself towards an argument against them, on the "simplicity" stance - we'll see) > Practical experience in the field of software maintenance says > unambiguously that simplicity is a very serious issue, > especially if one is to have enough standardised semantic > markup to ensure that tools can do a good job for the client > author. And this, of course, is why Edward Loper and I are having such a long discursion on the SIG, and particularly why Edward Loper keeps pushing for more formalism and less complexity - he rightly worries that too much complexity will make our markup too difficult to use, and too ambiguous to work with, whilst I fret about users typing things that they feel *should* work... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 23 11:39:27 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 11:39:27 -0000 Subject: [Doc-SIG] Terminology (was RE: formalizing Structured Text) In-Reply-To: <200103221333.f2MDXBp17465@gradient.cis.upenn.edu> Message-ID: <000701c0b38d$ea90e960$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > I wanted to make sure my terminology was clear, because it looked > like the indentation got messed up somehow. My terms are: > * valid > * invalid > * illegal > * undefined Ah - that makes more sense. I wondered if that had happened. In that case, I agree, and will try to conform myself. Now - earlier on you called me to task on my naming of paragraphs and so on (correctly so). I started to work up a list of "common terms", so we could reach agreement (I surely need better terms, as we saw) but ran out of time. Could I ask you to run something together to float on the list? I think (from inaccurate memory) we have something like: text block -- one of the many sorts of paragraph paragraph -- a "vanilla" text block list item -- a text block that starts a list item Python block -- a literal text block introduced by '>>>' literal block -- a literal text block introduced by '::', may contain blank lines markup -- the result of colourising literal string -- what goes within single quotes Python string -- what goes within '#..#' emphasised text -- what goes within '*..*' strong text -- what goes within '**..**' hmm - shift between 'text' and 'string' is clumsy, but may be justified - they *are* strings, sort of. URL -- shorthand (inaccurate) for a URI quoted string -- something in '".."' paragraph label -- we must have a better name for this anchor -- a '..[anchor]' thingy - maybe we actually have an "anchor block" as well localref -- or "local reference" - refers to an anchor, looks like '[this]' Please criticise! Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Fri Mar 23 13:38:59 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 23 Mar 2001 08:38:59 EST Subject: [Doc-SIG] ST and DOM Message-ID: <200103231339.f2NDd0p12208@gradient.cis.upenn.edu> So I was just looking through the XHTML DTD, and it doesn't really seem like what we want. But Tib's points about the DTD representation being important as a well-defined interface to ST are well-taken.. Thus, I'd like to hash out some of the involved issues so I can put the appropriate stuff in my PEP. :) For now, I want to *only* consider global formatting. We'll get to local formatting (=colorising) later. :) There are 2 basic types of global formatting element: basic elements (which are atomic, as far as global formatting goes); and hierarchical elements (which are not). I really think that the DOM tree should capture the *structure* of the formatted string.. To me, that means that it's weird to have elements like define a list item to be "a text block that *starts* a list item"... Anyway, I propose that we use something similar to the following scheme: Basic units:: Hierarchical units:: Some notes on this scheme.. Some of these might end up getting changed.. * labelsection can only appear at top-level * anchorsection can only appear at top-level, and after all other elements of structuredtext. * list items may not contain sections; but they can contain just about anything else (except top-level-only things). * anchor sections may not contain sections; but they can contain just about anything else (except top-level-only things). * labelsections can contain anything except top-level-only things. However, particular labels may place further restrictions on their contents.. Now, this is not meant to be a final DTD.. For example, it might make sense to split list, listitem, and bullet into 3: dlist, olist, ulist, etc.. But does this *overall* structure seem reasonable? For comparison, Tibs has a DTD at the bottom of , although I'm not sure if it's up-to-date. It seems to go against some of the things he's been saying on doc-sig lately.. (??). -Edward From tony@lsl.co.uk Fri Mar 23 14:21:22 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 14:21:22 -0000 Subject: [Doc-SIG] ST and DOM In-Reply-To: <200103231339.f2NDd0p12208@gradient.cis.upenn.edu> Message-ID: <000d01c0b3a4$89730980$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > So I was just looking through the XHTML DTD, and it doesn't really > seem like what we want. It is a bit odd, isn't it. > But Tib's points about the DTD representation > being important as a well-defined interface to ST are well-taken.. Good. > Thus, I'd like to hash out some of the involved issues so I can > put the appropriate stuff in my PEP. :) I think that we should agree to agree on a DTD - that has advantage for us in that we can both use the knowledge gained/shared, and it has *definite* advantages for (a) people deciding which PEP they want (if not both) and (b) tool users trying to take advantage of either/both of our packages. We might even get STNG to agree... Is this actually a separate PEP altogether? ("Doc-SIG - the PEP producer") > For now, I want to *only* consider global formatting. We'll get to > local formatting (=colorising) later. :) Reasonable. So we're defining "text blocks" and the structure above them. (for those who don't know it, the major oddity of the XHTML DTD is that it *doesn't* draw this distinction, so one gets the strange sort of concept of: contains: <#text node> which is *distinctly* odd to someone trying to work with a non-XML document, and is one (although not the major) reason why I made my internal datastructure non-DOM). > There are 2 basic types of global formatting element: basic > elements (which are atomic, as far as global formatting goes); > and hierarchical elements (which are not). OK - that's how I normally think too. But that distinction comes for free with using a DTD, really. > I really think that the DOM tree should capture the *structure* of > the formatted string.. To me, that means that it's weird to have > elements like define a list item to be "a text block that *starts* > a list item"... Anyway, I propose that we use something similar to > the following scheme: Agreed. Some additional elements are needed for callable object docstrings, though - informally, one also needs the "funcdesc" (apologies for the poor name) which is made up of a "signature" and an optional "summary-descripton" - for instance:: function(fred[,boolean]) -> integer -- This is silly. or function(fred[,boolean]) -> boolean This is silly. (the two examples are identical in "meaning"). This is *important* for docstrings, and should not be forgotten now if we are tailoring a solution for such. Maybe they should be "callable", "callable_signature", "callable_summary" (or maybe one can elide the "callable" on the sub-elements. The following is probably wrong (and the names are too long!): > Basic units:: > > > > > > > > > Hierarchical units:: > > literalblock | doctestblock | > labelsection)*, > anchorsection*)> > (section | paragraph | list | > literalblock | doctestblock)+)> > > (paragraph | list | > literalblock | doctestblock)*)> > (paragraph | list | > literalblock | doctestblock)*)> > (section | paragraph | list > literalblock | doctestblock)+)> > > Some notes on this scheme.. Some of these might end up getting > changed.. > * labelsection can only appear at top-level Needs debating - I don't necessarily disagree, though. > * anchorsection can only appear at top-level, and after all > other elements of structuredtext. I probably disagree. Probably. > * list items may not contain sections; but they can contain > just about anything else (except top-level-only things). I *do* agree (I too dislike sections in list items!) > * anchor sections may not contain sections; but they can > contain just about anything else (except top-level-only > things). Makes sense. > * labelsections can contain anything except top-level-only > things. However, particular labels may place further > restrictions on their contents.. Agreed. I would personally prefer to lose "bullet" as such, and retain only "key" or "description" for descriptive lists. I do not wish the renderer to take the bullet (or number sequence) as anything other than a hint, and thus I think it should be an attribute, not an element... Also to be reserved for future consideration: it seems natural to me to build a DOM tree that represents the whole module or package that is being dealt with, and "blat it out" in one go to the final format. This allows one to handle cross-referencing within a package (validate it, that is), rearrange the tree *as a whole*, and so on. So we will also want (optional) infrastructure *above* what you have defined. I would propose that we have a toplevel node called something like "document" (heh, its traditional), and appropriate nodes allowed below that called "module", "function", "class" and "method", with other appropriate nodes and attributes for storing the useful information one might want to cache thereon. This is how docutils currently works (well, more or less). But as I said, for future consideration. > Now, this is not meant to be a final DTD.. For example, it might > make sense to split list, listitem, and bullet into 3: dlist, olist, > ulist, etc.. But does this *overall* structure seem reasonable? I think it probably does make such sense (I'd prefer it that way). But I agree, it's a good start. Do we have anyone around, listening, who actually knows how one is *meant* to design a *good* DTD (i.e., I'm sure we can come up with something workable, but are there conventions, known boobytraps, etc., that we can be helped with to get something really good?) > For comparison, Tibs has a DTD at the bottom of > , > although I'm not sure if it's up-to-date. It seems to go against > some of the things he's been saying on doc-sig lately.. (??). It's very old, it was very preliminary, and it's just plain wrong. So ignore it. (main task this weekend: rewrite STpy.html possibly to be preempted by all the "real life" things I also have to do...) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Which is safer, driving or cycling? Cycling - it's harder to kill people with a bike... My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Fri Mar 23 14:28:14 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 23 Mar 2001 09:28:14 EST Subject: [Doc-SIG] Re: docutils REs In-Reply-To: Your message of "Fri, 23 Mar 2001 10:22:59 GMT." <002a01c0b383$3c18a670$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103231428.f2NESEp14932@gradient.cis.upenn.edu> [question about big long re for descr list items] > I can't offhand remember - the RE growed until it appeared to work, and > some of it appeared to rely on the "fuzzy" handline that REs appear (to > me) to do in balancing the greediness of different bits of the RE. It's > possible it's skeletal remains which should be excised, I suppose. Hm.. I bet that's how the STNG REs got where they are today! ;) If you get a chance, could you try taking those lines out, and see if it still passes your test cases? > Not according to the RE documentation in the Python 1.5.2 reference > manual, they don't - that's quite clear in saying start and end of > STRING, and recognition of newlines is only in MULTILINE mode. Hm. You're right. I was confused. I wonder why I was. Oh well, I still don't like the fact that '$' matches '\n'. >>> re.match('$', '\n') > > we should start making a list of proposed changes to STNG, > > in order to make STpy and STNG more compatible.. > Well, no, I wouldn't say that. Ho hum. My list of things to do grows by one. :) > > I haven't decided yet on whether I'm happy about having this > > concept of "acceptable ending punctuation.." It sort of seems > > like *all* punctuation should be ok, or *none*.. > > I'm not *too* happy about it myself, and actually it's a string that's > '%' included into the RE texts where it's needed - this means that (a) > it's easy to change, but (b) it should be the same in all places - I > thought consistency was a Good Idea. I definitely agree that, if you do have it, using '%' to splice it in is the Right Thing to do. And that way we can one day try replacing it with an RE for all punctuation, and see how that affects our test cases. :) > > (e.g., should it be ok to have a dash after > > an *emph region*-like this?) > > That looks wrong to me - but then you can see how I use dashes in plain > text! Ok. Bad example. How about saying e-*mail* to put stress on the "mail" part, or *bad*-ass... > There *are* some conventions on how one uses punctuation - for instance, > 'this ,' looks wrong to almost everyone. ST just enforces some > of them (this is, of course, yet another class of things to consider > warning people about). Well, we're not in the business of enforcing punctuation use, so when we can get away with it reasonably, we should let them do whatever they want.. The problem is deciding how it interacts with markup.. -Edward From edloper@gradient.cis.upenn.edu Fri Mar 23 15:27:05 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 23 Mar 2001 10:27:05 EST Subject: [Doc-SIG] ST and DOM In-Reply-To: Your message of "Fri, 23 Mar 2001 14:21:22 GMT." <000d01c0b3a4$89730980$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103231527.f2NFR5p19280@gradient.cis.upenn.edu> > I think that we should agree to agree on a DTD I'll agree, sort of.. One of the PEPs I'm writing has reduced functionality, so its DTD will be a subset of the agreed-upon DTD (in some sense, anyway).. > Is this actually a separate PEP altogether? ("Doc-SIG - the PEP > producer") Hm. I think you're getting a bit PEP-happy. But I'll address that issue later.. > > For now, I want to *only* consider global formatting. We'll get to > > local formatting (=colorising) later. :) > > Reasonable. So we're defining "text blocks" and the structure above > them. Well, almost but not *quite*.. For example, I'd say that the following is one text block:: label: paragraph But it's still got global formatting within it.. > > There are 2 basic types of global formatting element: basic > > elements (which are atomic, as far as global formatting goes); > > and hierarchical elements (which are not). > > OK - that's how I normally think too. But that distinction comes for > free with using a DTD, really. I don't see how it comes free.. You can choose to draw the lines where you want.. (e.g., you were saying that anchors were local formatting). I used the following heuristic to divide things up: * Choose the smallest set of hierarchical elements such that: * paragraph is a basic element. * anything that can contain a basic element or a hierarchical element is a hierarchical element. > Agreed. Some additional elements are needed for callable object > docstrings, though - informally, one also needs the "funcdesc" > (apologies for the poor name) which is made up of a "signature" and an > optional "summary-descripton" - for instance:: > > function(fred[,boolean]) -> integer -- This is silly. > > or > > function(fred[,boolean]) -> boolean > > This is silly. I disagree. Isn't this the whole point of inspect? To get that information? Why include it in the doc string? That just seems to make things very prone to errors. What happens if the signature doesn't match the real signature? etc. > > * labelsection can only appear at top-level > > Needs debating - I don't necessarily disagree, though. I have trouble thinking of what it would mean for labelsections to appear deap within a docstring. > > * anchorsection can only appear at top-level, and after all > > other elements of structuredtext. > > I probably disagree. Probably. I think that if we want anchors to be available anywhere in a docstring, then we need to change them to be local markup, allow them *anywhere* that normal local markup is allowed, and have them be invisible. We would probably also have to change the notation for them. Then, if you want to do an endnote, you just include an anchor at the beginning of the footnote.. Something like:: '[foo]' Foo is a dummy word. Where '' is whatever syntax we decide to use for anchors. I'm not saying this is a *good* thing to do, but I like it better than allowing anchors, as they are currently defined, to appear anywhere. That just seems like a hack. And I don't think the meaning will be obvious to someone reading the plaintext who's not familiar with ST (which it *should* be). > > * list items may not contain sections; but they can contain > > just about anything else (except top-level-only things). > > I *do* agree (I too dislike sections in list items!) The only potential problem I can see is people wanting to use sections in DL items under label sections.. (e.g., when describing a parameter). But I don't think we should let them! :) > Also to be reserved for future consideration: it seems natural to me to > build a DOM tree that represents the whole module or package that is > being dealt with, and "blat it out" in one go to the final format. This > allows one to handle cross-referencing within a package (validate it, > that is), rearrange the tree *as a whole*, and so on. So we will also > want (optional) infrastructure *above* what you have defined. > > I would propose that we have a toplevel node called something like > "document" (heh, its traditional), and appropriate nodes allowed below > that called "module", "function", "class" and "method", with other > appropriate nodes and attributes for storing the useful information one > might want to cache thereon. I think we still need a "structuredtext" element (or something similar), and a distinct "module" element.. the reason being that the "structuredtext" element can contain labeled sections, but a module shouldn't.. Instead, it should contain author sections and version sections etc.. So I think we should have 2 separate *top level* interfaces, which share a bunch of stuff: * the "structuredtext" top-level element is produced when we parse any random ST string, without knowing what it represents. * The docstring top-level elements, like "module" and "function" The first would be produced by a parser; and the second by a docstring tool. I took some of your comments into account, and came up with this revised DTD. The same caveats apply to this one that applied to the last one. :) Basic blocks:: Hierarchical blocks:: Docstrings:: ... Note that the description element does *not* include labelsection elements... I said that ordered list bullets are required.. is that reasonable? Should they be '#IMPLIED' instead? -Edward From tony@lsl.co.uk Fri Mar 23 16:45:05 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 23 Mar 2001 16:45:05 -0000 Subject: [Doc-SIG] ST and DOM In-Reply-To: <200103231527.f2NFR5p19280@gradient.cis.upenn.edu> Message-ID: <001201c0b3b8$9cecc190$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > > Agreed. Some additional elements are needed for callable object > > docstrings, though - informally, one also needs the "funcdesc" > > (apologies for the poor name) which is made up of a > "signature" and an > > optional "summary-descripton" - for instance:: > > > > function(fred[,boolean]) -> integer -- This is silly. > > > > or > > > > function(fred[,boolean]) -> boolean > > > > This is silly. > > I disagree. Isn't this the whole point of inspect? To get that > information? Why include it in the doc string? That just seems > to make things very prone to errors. What happens if the > signature doesn't match the real signature? etc. I believe it is by fiat of the BDFL, in fact - this is a factlet he wishes to have inserted into callable object docstrings. Things like IDLE will use it to produce a pop-up when you type the name of a callable. Technically, it is *not* necessarily the same information you get from the Python code. The first part (signature) declares information you don't get therefrom (the return value) and also uses human readable text for the values, which might arguably be different than their names. The second part is meant to be the "traditional one line summary" of what the callable does. But as I understand it, we are considered unlikely to get a PEP accepted if we don't cater for it. Damn - I can't offhand see a reference to it in my saved Doc-SIG emails, but I'm sure *someone* said words to that effect (someone, are you listening?). > > > * labelsection can only appear at top-level > > > > Needs debating - I don't necessarily disagree, though. > > I have trouble thinking of what it would mean for labelsections > to appear deap within a docstring. I'm not convinced it *does* mean anything, but it still feels like it's not proven yet... > > > * anchorsection can only appear at top-level, and after all > > > other elements of structuredtext. > > > > I probably disagree. Probably. > > I think that if we want anchors to be available anywhere > in a docstring, then we need to change them to be local > markup, allow them *anywhere* that normal local markup is > allowed, and have them be invisible. We would probably > also have to change the notation for them. Then, if you > want to do an endnote, you just include an anchor at > the beginning of the footnote.. Something like:: > > '[foo]' Foo is a dummy word. > > Where '' is whatever syntax we decide to use for anchors. > > I'm not saying this is a *good* thing to do, but I like > it better than allowing anchors, as they are currently defined, > to appear anywhere. That just seems like a hack. And I don't > think the meaning will be obvious to someone reading the > plaintext who's not familiar with ST (which it *should* be). Hmm. Maybe that's a vote for saying we'll deal with anchors as they are now (and make them into anchorsections, as you wish) and defer the other issue - yes, I could go for that, especially as anchors-as-they-are-now is what was discussed and requested (once upon a time) whereas generic anchor points wasn't. > > > * list items may not contain sections; but they can contain > > > just about anything else (except top-level-only things). > > > > I *do* agree (I too dislike sections in list items!) > > The only potential problem I can see is people wanting to > use sections in DL items under label sections.. (e.g., > when describing a parameter). But I don't think we should > let them! :) I agree - that's normally a "presentation" issue, anyway (as in fighting the default presentation of browsers). And if people jump up and down about it too much, we can always change our minds later on. The rest of your email I am resolutely putting aside (i.e., printing) to look at in more detail later on. But this is Good Stuff - I think we're getting somewhere very useful. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From fdrake@acm.org Fri Mar 23 17:43:55 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 23 Mar 2001 12:43:55 -0500 (EST) Subject: [Doc-SIG] Doc/ tree frozen for 2.1b2 release Message-ID: <15035.35675.217841.967860@localhost.localdomain> I'm freezing the doc tree until after the 2.1b2 release is made. Please do not make any further checkins there. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@localhost.localdomain Fri Mar 23 19:11:52 2001 From: fdrake@localhost.localdomain (Fred Drake) Date: Fri, 23 Mar 2001 14:11:52 -0500 (EST) Subject: [Doc-SIG] [development doc updates] Message-ID: <20010323191152.3019628995@localhost.localdomain> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Documentation for the second beta release of Python 2.1. This includes information on future statements and lexical scoping, and weak references. Much of the module documentation has been improved as well. From edloper@gradient.cis.upenn.edu Fri Mar 23 23:14:41 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 23 Mar 2001 18:14:41 EST Subject: [Doc-SIG] docstring signatures Message-ID: <200103232314.f2NNEfp25992@gradient.cis.upenn.edu> Guido, On doc-sig, we're trying to put together some standards/conventions for writing documentation strings, to propose in a PEP. (These conventions could then be used by all manner of docstring-related tools). Tibs said he thought that you wanted to require that such conventions include a "signature" for docstrings of callable objects, such as:: def primes(n): """ primes(n) -> lst -- Return a list of all primes in the range [2,n]. """ ... or:: def primes(n): """ primes(n) -> lst Return a list of all primes in the range [2,n]. """ ... However, it was unclear to me whether that would be affected any by the introduction of tools like inspect.py and pydoc.py's help function. In particular, much of the "signature" information can be obtained by calls to inspect methods; and there is a question of what to do if the "signature" disagrees with inspect. When designing our docstring conventions, should we include signatures, like the one given? Or can we feel free to put information about what is returned by the function, etc., in other places (e.g., under a "Returns: " section)? If you do want us to include signatures, is there somewhere where what they should look like is defined (e.g., whether you should say "primes(n) -> lst" or "primes(int) -> list")? -Edward From edloper@gradient.cis.upenn.edu Fri Mar 23 23:42:57 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 23 Mar 2001 18:42:57 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Fri, 23 Mar 2001 10:34:29 GMT." <002b01c0b384$d7153480$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103232342.f2NNgvp27539@gradient.cis.upenn.edu> > I remember when most tools would show trailing whitespace visibly. Fine by me, as long as we explicitly say that all spaces in text (not in literals) are soft. It seems like the parser *should* reduce sequences of multiple spaces, but I'll live if it doesn't (c.f., XML parsers are required to reduce sequences of multiple spaces in attribute strings like this: ''). > Word wrapping is a presentation issue - if the renderer is generating > etext, or STNG, then it may make sense to *not* word wrap. Yes, but the reader should understand that their text *can* get word-wrapped at (non-literal) spaces. > > (unless, of course, you can give me a good reason why we *should* > > preserve sequences of spaces. > > No. It's only laziness. Ok. Well, I'll be happier if parsers strip that whitespace eventually.. But I won't worry about it for now. :) [Tibs discusses ***] Ok. So, on further thought, *** can be given consistant meaning (assuming a left-to-right-style parsing): CURRENT CONDITION | Meaning Emph? | Strong? | -------+----------+------------------- no | no | start both strong & emph no | yes | end strong, start emph yes | no | end emph, start strong yes | yes | end both strong & emph If you do give '***', that is the meaning it should recieve. Note that '****' shouldn't ever really have a meaning. I guess I'll just have to wait for your nested-coloring regexps. :) (But I still think that '***' is potentially confusing to readers, and that's a Bad Thing). -Edward From Edward Welbourne Fri Mar 23 19:35:06 2001 From: Edward Welbourne (Edward Welbourne) Date: Fri, 23 Mar 2001 19:35:06 +0000 (GMT) Subject: [Doc-SIG] suggestions for a PEP In-Reply-To: <200103211835.f2LIZOp09967@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103211835.f2LIZOp09967@gradient.cis.upenn.edu> Message-ID: Me, then Edward: >> I know I'm about to vary my tune but ... someone else has been talking >> persuasively out-of-band. Rather than borrowing the doc directly off >> the parent ... > I think the issue of whether to borrow, or point back, etc., should > be one for the tools. Which may be a good reason for the language > *not* to do anything automatic, like inheriting doc strings. There > are similar questions about whether inherited methods should be listed > in a separate section or not, etc. OK, sounds like more convergence of opinions. And it fits with the benevolent dictat ;^> > But at any rate, we should say that having f.__doc__=None indicates > that inheriting docs is acceptable, and f.__doc__='' means that > inheriting docs is not acceptable. (minor requrest: #f.__doc__ is None# in preference to use of ==) > Of course, all of this will be difficult to do if we're > parsing the file instead of loading it as a module; but that's ok. :) No harder for your parser than for Guido's ;^> Eddy. From Edward Welbourne Fri Mar 23 20:44:49 2001 From: Edward Welbourne (Edward Welbourne) Date: Fri, 23 Mar 2001 20:44:49 +0000 (GMT) Subject: [Doc-SIG] URLs In-Reply-To: <3AB88028.605B88A6@lemburg.com> (mal@lemburg.com) References: <200103202040.f2KKe4p15488@gradient.cis.upenn.edu> <3AB88028.605B88A6@lemburg.com> Message-ID: OK, lots of stuff here and I'm a bit lost so I'm going to think out loud at you, so you have a good chance of spotting where my confusion diverges from what you thought you were saying. If I'm confused, how confused are the lurkers ? It makes sense to provide for a bibliographic definition mechanism for defining short names for use in xrefs in terms of full URLs (ideally with some form of commentary). As I understand it, this is what the > there is a directive of the form: > > ..[ref] url bit is about. How about providing for the url to be followed by arbitrary text to be presented, in the `See also' or biblio section as a description of the relevant xref ? This then makes it possible, as discussed, to use [ref] in the body of paragraphs as a link. This is the `in the style of bibliographic citation' idiom that I gather STNG folk are wedded to, and I see every reason to honour their choice. I don't understand the use of "some text":[ref] with the above reading of [ref], since the citation idiom calls for [ref] to be the text that appears in the output, so aren't you throwing away the "some text", so what's it for. However, I can see a use for "anchor text": as a good way to say, inline, that you want the given anchor text to appear (with its "quotes" stripped) as the text of a link to the given URL. I can see how it might be desirable to use "anchor text":[ref] as a short-hand for the above but requesting [ref]'s URL as the URL, provided ref is the subject of a '..[ref] url' directive. In which case inline [ref] is implicitly a short-hand for "[ref]":[ref] - i.e. use '[ref]' as the text of a link to the url specified in the '..[ref] url' directive. Indeed, I'd be tempted to at least allow the '..[ref] url' directive to enclose url in <...> for the sake of similarity. The only difference from Edward's > Use "name":[ref] for in-line hrefs. If ref is a single token, and > there is a directive of the form: > > ..[ref] url > > Then use url as the URL; otherwise, use ref as the URL. is then that the `otherwise use ref as the URL' fuzziness gets blown away: we get Inline use of "text":[ref] is then a link, with text "text", to the url specified elsewhere by a '..[ref] url optional comments' directive; inline use of simply [ref] is equivalent to "[ref]":[ref] Inline use of "text": is a link, with text "text", to the specified url, without recourse to an '..' directive. anything else vaguely resembling these is just a lump of text with some surprising uses of punctuation. This gets us the asked-for win in terms of letting URLs end in . or appear at the end of a sentence (or both) without ambiguity, while also gaining the asked-for parsability win *and* saving the `if that happens this otherwise the other' gumbo Edward was giving. Furthermore, use of # in a URL will now be within <...>, so we get spared various parser uglies. If I've understood what STNG does (which is a big if, as it's all by inference from what I think you guys are saying), this either removes or simplifies the problem of persuading the STNG folk, since it no longer clashes with the [ref] forms they're used to, and probably makes their lives a lot easier when it comes to parsing the "text":url idioms Tibs lists. And the above is manifestly simpler and more intuitive IMO ;^> > It would indeed make life a lot simpler. ;^) Tibs: > Inline refs were introduced deliberately to look like footnotes aside: [blah] is surely what *bibliographic citations* look like, not *footnotes* in any typesetting idiom I've ever met. But you meant that, I presume. (not sure who): > 3. Local references (which look like '[this]' or '[1]') are now ... > ..[this] ah, so a paragraph starting (or preceded by a line of form ?) '..[this]' is implicitly an accompanied by a:: ..[this] <#something random> directive somewhere in the docstring ? Thus enabling xrefs to that para from within the document using [this] or "anchor text":[this]. And "something random" is putatively "this", I suppose, in which case we've also enabled "anchor text":<#this> Sounds good. > Clarification on the syntax.. is *anything* that looks like [this] a > local reference, or does it have to be preceeded by "a parenthetical > like"[this] or "a parenthetical and a colon like":[this]? erm ... any use of [ref] is either just some text with funny punctuation or using the same name, 'ref', as some particular '..' directive. What problem is there in distinguishing ? Is it the fact that the generated page, in which the <#this> anchor is defined, may be made of several doc strings, so that you don't *know* whether there's a ..[this] in one of the other doc strings making up the page ? If one of the latter, does [this] get rendered with brackets? Flagged as a warning when validating (in principle, not in current implementation)? > If one of the latter, does [this] get rendered with brackets? Flagged > as a warning when validating (in principle, not in current > implementation)? either way, [this] gets rendered with brackets: either it's being made to look like a citation, to the URL specified for '..[this]' to refer to, or it's a lump of random text (about which a doc tool may wish to generate a warning, at least if 'this' matches the label-spec). > What is acceptable content for [this]? '[\w_-]+'? Hmm. Well, ideally we'd support standard citation forms, which would include '[this, that, other]', to be treated like '[this], [that], [other]' but with the excess punctuation ditched (this *is* a standard usage of the citation idiom being mimicked, after all; used when what was said just before it is backed up by three separate texts elsewhere). This can only sensibly be applied to '[refs]' forms, not to '"text":[refs]' forms, for obvious reasons. We'd still be using '[\w_-]+' for the names specified in a '..[ref]' definition, but using '[\w_-]+(, [\w_-]+)*' as the contents of a [...] used as an inline link. But, that aside, and allowing we might insist on the `excess punctuation' being given explicitly (for simplicity/unambiguity), [\w_-]+ sounds like a reasonable deal, albeit I might ditch _ and, in any case, really just ask for the same regex as we use for Labels ... One might plausibly want to allow '&' in ref names (within [...], as opposed to within where, obviously, they're allowed) because of all those papers and books by two authors whose names are the standard way to refer to the book, e.g. ..[K&L] Kaye and Laby, Tables of Physical and Chemical Constants, pub. Longman Scientific and Technical (ignoring the questions of whether the scheme ISBN is implemented yet; pretend the fake ISBN URL were replaced with a suitable URL on Longman's (or some online bookstore's) web site.) But, again, we could demand simplicity and insist on [KandL] without doing anyone any real harm. >> I think we should just go with the English definition of a word, >> which means [-A-Za-z], and leave it at that. It is *meant* to look >> like a word. > Is that too anglo-centric? (modulo inclusion/exclusion of _ which I don't care about) No, it's ASCII-centric and we're really working inside ASCII, so it's appropriate; except that I'd want to include digits, at least for [citations] and I'd argue that we should anticipate folk wanting to use python identifiers here (when, e.g., the relevant python object is defined in some other module and the author doesn't want to rely on vagaries of the doc-tool's relationship with include directives), hence requiring _ and digits; i.e. I agree with Edward's > ... underlines and digits are more applicable for endnotes. > Some people might like this [1] or this [noam_chomsky97]. I'd go for either: * citation names are [\d\w_-]+ read case-sensitively * doc-string labels are [a-z\d-]+ once passed through string.tolower or * both kinds are [\d\w_-]+ read case-insensitively (here using \w_ purely to keep out of arguments about whether \w includes _ already) without noticable preference, and accept that all ST-generic doc-string labels are expected to be Anglic words, hence not to *exercise* the \d allowed in the label spec, but to *allow* \d in labels for the sake of ST-specific dialects which may well want, e.g., to use a number in a label. (By the way - Edward, some of your sentences end .. others end in a single . - why ? i.e. is there a reason other than bouncy fingers ?) Eddy. From edloper@gradient.cis.upenn.edu Sat Mar 24 03:44:03 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 23 Mar 2001 22:44:03 EST Subject: [Doc-SIG] anchors and local references In-Reply-To: Your message of "Fri, 23 Mar 2001 20:44:49 GMT." Message-ID: <200103240344.f2O3i3p09660@gradient.cis.upenn.edu> > OK, lots of stuff here and I'm a bit lost so I'm going to think out loud > at you, so you have a good chance of spotting where my confusion > diverges from what you thought you were saying. If I'm confused, how > confused are the lurkers ? I think part of that confusion comes from me talking without actually knowing what I'm talking about. :) So, I'll go ahead and try to give a brief description of where anchors and local references currently stand (I think), so everyone will be clear on it: /======================================================================\ FUNCTION Anchors and local references are used to add bibliographic references and endnotes to StructuredText. Local references are used to refer to the references/endnotes, and anchors are used to write the references/endnote. (n.b.: we may want to change these terms, because the words themselves suggest something more general). SYNTAX Local references look like [this]. They are normally used either for bibliographic reference, like this: [eddy00]; or for endnotes, like this: [1]. Local references can appear anywhere in the text of a paragraph or list item (and possibly other places, like in a heading). Anchors look like this:: ..[this] This is the anchor for the reference '[this]' The form "..[name]" patterns syntactically almost exactly like the form "name:". In other words, you can do the following:: ..[anchor1] Anchors may span multiple lines. ..[anchor2] Anchors may contain multiple paragraphs. Or even lists. etc. The name of a local reference/anchor should be a single word, but can contain a few punctuation marks (&, _, -, maybe others). The exact contents of a name is yet to be determined, but we can tentitively say it's something like: '[\w&_-]+|[\d]+' Anchors must be the last top-level elements of a StructuredText string. SEMANTICS When a StructuredText string is displayed, a local reference should appear as it does in plaintext. However, it may also be linked in some way to its anchor (e.g., with an href in HTML). For example, in HTML, '[this]' would be rendered as:: [this] When anchors are displayed, their name should be displayed as some type of heading or list bullet, and their contents should be listed under that section or in that list item. For example, it might be sensible to render anchors using DL's in HTML. Also, if local references are linked to anchors, then the anchor should include the target for the link. So:: ..[this] anchor Might be rendered in HTML as::
this
anchor
..[eddy00] email from Edward Welbourne, recieved Fri, 23 Mar 2001 20:44:49 +0000 (GMT). ..[1] It may make sense to say that we should use numbers for endnotes and words for bibliographic entries, but we won't say that for now. \======================================================================/ Note that this is *not* used for out-of-line URI references, which is what I thought it was for at one point. Hmm.. hopefully that helped clarify things a little. I'll have a better explanation once I'm done with my PEP. :) > aside: [blah] is surely what *bibliographic citations* look like, > not *footnotes* in any typesetting idiom I've ever met. But *endnotes* do look like that in some typesetting idioms.. > Hmm. Well, ideally we'd support standard citation forms, which would > include '[this, that, other]', to be treated like '[this], [that], > [other]' Fine with me, if others also want it. Of course, I also wouldn't feel bad about making people type [this], [that], [other]. > One might plausibly want to allow '&' in ref names I agree. > (By the way - Edward, some of your sentences end .. others end in a > single . - why ? i.e. is there a reason other than bouncy fingers ?) No, it's not bouncy fingers.. I'm not sure, exactly.. it's an idiom I only use in emails. I think I use it where I would pause slightly longer if I were speaking. But I'd have to go read my own emails over to figure it out for sure. Sometime's I'll even end my sentences with three periods... :) -Edward.. From edloper@gradient.cis.upenn.edu Sat Mar 24 17:39:00 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sat, 24 Mar 2001 12:39:00 EST Subject: [Doc-SIG] Formalizing ST Message-ID: <200103241739.f2OHd0p13768@gradient.cis.upenn.edu> Hmm.. So I'm starting to think that EBNF really isn't the best formalism for capturing global formatting. It works great for local formatting (=coloring), but it just doesn't do a good job of capturing indentation-related constraints.. So I'm thinking of turning STminus into a two-part formalism: one part to describe global formatting, and one to describe local formatting. EBNF or EBNFla would be used for local formatting, but I'm not sure what to use for global formatting. Any ideas? -Edward From guido@digicool.com Sat Mar 24 19:01:05 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 24 Mar 2001 14:01:05 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Your message of "Fri, 23 Mar 2001 18:14:41 EST." <200103232314.f2NNEfp25992@gradient.cis.upenn.edu> References: <200103232314.f2NNEfp25992@gradient.cis.upenn.edu> Message-ID: <200103241901.OAA27542@cj20424-a.reston1.va.home.com> > Guido, > > On doc-sig, we're trying to put together some standards/conventions > for writing documentation strings, to propose in a PEP. (These > conventions could then be used by all manner of docstring-related > tools). Tibs said he thought that you wanted to require that such > conventions include a "signature" for docstrings of callable objects, > such as:: > > def primes(n): > """ > primes(n) -> lst -- Return a list of all primes in the range [2,n]. > > > """ > ... > > or:: > def primes(n): > """ > primes(n) -> lst > > Return a list of all primes in the range [2,n]. > > > """ > ... Hm, strange. Tibs must have been channeling someone else. I've used this style of docstrings for C functions, where there's no good way to find out the arguments, but not on Python functions and methods. > However, it was unclear to me whether that would be affected any by > the introduction of tools like inspect.py and pydoc.py's help > function. In particular, much of the "signature" information can be > obtained by calls to inspect methods; and there is a question of what > to do if the "signature" disagrees with inspect. > > When designing our docstring conventions, should we include > signatures, like the one given? Or can we feel free to put > information about what is returned by the function, etc., in other > places (e.g., under a "Returns: " section)? Sure. > If you do want us to include signatures, is there somewhere > where what they should look like is defined (e.g., whether > you should say "primes(n) -> lst" or "primes(int) -> list")? > > -Edward PS. Don't spend too much time trying to make StructuredText or some variation thereof work. In my experience with systems that use ST (like ZWiki), it sucks. There basically are two options I like: nicely laid out plain text, or a real markup language like Latex or DocBook. --Guido van Rossum (home page: http://www.python.org/~guido/) From edloper@gradient.cis.upenn.edu Sat Mar 24 19:52:39 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sat, 24 Mar 2001 14:52:39 EST Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Your message of "Sat, 24 Mar 2001 14:01:05 EST." <200103241901.OAA27542@cj20424-a.reston1.va.home.com> Message-ID: <200103241952.f2OJqdp19400@gradient.cis.upenn.edu> > Don't spend too much time trying to make StructuredText or some > variation thereof work. In my experience with systems that use ST > (like ZWiki), it sucks. There basically are two options I like: > nicely laid out plain text, or a real markup language like Latex or > DocBook. > > --Guido van Rossum (home page: http://www.python.org/~guido/) Hm. I guess I should have thought to ask the BDFL about all this before now. :) Makes me wonder if he'll like/accpet *any* of the stuff we've been talking about. But it's interesting to hear that Guido is ok with a real markup language. So are there any vocal opponents of using a real markup language on doc-sig right now? (Assuming that Guido doesn't want us to use something like ST).. Of course, on the other hand, if we can clean ST up enough, and make it formal, maybe he'll be ok with it. I'm going to put my PEP on hold for now, until we figure this stuff out.. (if anyone wants to see what I've written so far, though, I'll be happy to send you a copy -- just email me). I'm also thinking of putting together a "minimal" ST-like language, that would include markup for: * lists * emph * literals (one type, probably using '#' as delimiters) * urls (using '<>' delimiters) * literal blocks But maybe we'd be better off just using XML.. :) or something like javadoc ('@param(x) foo..', etc.).. -Edward From edloper@gradient.cis.upenn.edu Sat Mar 24 20:07:38 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sat, 24 Mar 2001 15:07:38 EST Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Your message of "Sat, 24 Mar 2001 14:01:05 EST." <200103241901.OAA27542@cj20424-a.reston1.va.home.com> Message-ID: <200103242007.f2OK7cp20052@gradient.cis.upenn.edu> > PS. Don't spend too much time trying to make StructuredText or some > variation thereof work. In my experience with systems that use ST > (like ZWiki), it sucks. There basically are two options I like: > nicely laid out plain text, or a real markup language like Latex or > DocBook. Unfortunately, it's a bit late for that (there's been a lot of work this month put into trying to get a ST variant to work). I don't mind not using ST, but before we start working on other formatting conventions, I thought I should get a better idea of what you would or would not like, so we don't end up spending another month on something you won't like. ;) Here's the abstract from the PEP I've been putting together. It should give you a good idea of what we're trying to accomplish, at least: Python documentation strings provide a convenient way of associating documentation with many types of Python objects. However, there are currently no widespread conventions for how the information in documentation strings should be formatted. As a result, it is very difficult to write widely-applicable tools for processing documentation strings. Such tools would be useful for a variety of tasks, such as: * Converting documentation to HTML, LaTeX, or other formats. * Browsing documentation within python. * Ensuring that documentation meets specific requirements. This PEP proposes that the Python community adopt a well-defined set of conventions for writing "formatted documentation strings." These conventions can then be relied upon when writing tools to process formatted documentation strings. Note that some Python programs may choose not to use formatted documentation strings. For example, programs like Zope [1] have used documentation strings for purposes other than strict documentation, and it would be inappropriate to expect them to change how they use documentation strings. Also, some programmers may prefer to write plaintext documentation strings. Also, here are the "design goals" I defined: The following goals guided this PEP's design of conventions for writing formatted documentation strings. * Intuitiveness: The meaning of a well-formed formatted documentation string should be obvious to a reader, even if that reader is not familiar with the formatting conventions. * Ease of use: If the formatting conventions are to be accepted by the Python community, then it must be easy to write formatted documentation strings. * Formality: The formatting conventions must be formally specified. A formal specification allows different tools to interpret formatted documentation strings consistantly, and allows for "competing, interoperable implementations," as specified in PEP 1 [5]. * Expressive Power: The formatting conventions must have enough expressive power to allow users to write the API documentation for any python object. The following "secondary design goals" follow directly from the primary design goals, but are important enough to deserve separate mention: * Simplicity: The formatting conventions should be as simple as is practical, and there should be minimal interaction between different aspects of the formatting conventions. This goal derives from intuitiveness and ease of use. * Safety: No well-formed formatted documentation string should result in unexpected formatting. This goal derives from intuitiveness. So the question then is what sort of markup language we should define. I'd be quite happy to use something like Javadoc uses (but with a more restricted set of acceptable XML elements), but other people think that it's too hard to read/write... I'm also curious why you don't like ST-like markups. We've been putting a fair amount of work into formalizing it & making sure it's "safe" (e.g., it's an error to say *x**b*c**d*). If we can successfully do both, would that alleviate some of your concerns about ST? Any info you can give on what you would like to see come out of this project (or pointers to info) would be most appreciated. -Edward From guido@digicool.com Sat Mar 24 20:37:52 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 24 Mar 2001 15:37:52 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Your message of "Sat, 24 Mar 2001 15:07:38 EST." <200103242007.f2OK7cp20052@gradient.cis.upenn.edu> References: <200103242007.f2OK7cp20052@gradient.cis.upenn.edu> Message-ID: <200103242037.PAA28743@cj20424-a.reston1.va.home.com> > > PS. Don't spend too much time trying to make StructuredText or some > > variation thereof work. In my experience with systems that use ST > > (like ZWiki), it sucks. There basically are two options I like: > > nicely laid out plain text, or a real markup language like Latex or > > DocBook. > > Unfortunately, it's a bit late for that (there's been a lot of work > this month put into trying to get a ST variant to work). I don't > mind not using ST, but before we start working on other formatting > conventions, I thought I should get a better idea of what you would > or would not like, so we don't end up spending another month on > something you won't like. ;) > > Here's the abstract from the PEP I've been putting together. It > should give you a good idea of what we're trying to accomplish, at > least: > > Python documentation strings provide a convenient way of > associating documentation with many types of Python objects. > However, there are currently no widespread conventions for how the > information in documentation strings should be formatted. Not true. Most of the standard library uses the same convention, and even if it's not quite written down, it wouldn't be hard to figure out what it is. Also, my Python Style Guide (http://www.python.org/doc/essays/styleguide.html) has quite a bit of guidance. > As a > result, it is very difficult to write widely-applicable tools for > processing documentation strings. Again not true. Ping's pydoc does quite well second-guessing the existing conventions. > Such tools would be useful for > a variety of tasks, such as: > > * Converting documentation to HTML, LaTeX, or other formats. > * Browsing documentation within python. > * Ensuring that documentation meets specific requirements. > > This PEP proposes that the Python community adopt a well-defined > set of conventions for writing "formatted documentation strings." > These conventions can then be relied upon when writing tools to > process formatted documentation strings. > > Note that some Python programs may choose not to use formatted > documentation strings. For example, programs like Zope [1] have > used documentation strings for purposes other than strict > documentation, and it would be inappropriate to expect them to > change how they use documentation strings. Also, some programmers > may prefer to write plaintext documentation strings. Zope's a red herring (they are trying to get away from this Bobo-ism). Very often we read docstrings as part of the source code, and there plaintext is best, given the state of the art in text editors. > Also, here are the "design goals" I defined: > > The following goals guided this PEP's design of conventions for > writing formatted documentation strings. > > * Intuitiveness: The meaning of a well-formed formatted > documentation string should be obvious to a reader, even if > that reader is not familiar with the formatting conventions. > > * Ease of use: If the formatting conventions are to be > accepted by the Python community, then it must be easy to > write formatted documentation strings. Of course. This is all apple pie and motherhood. nobody will want documentation that's unintuitive or hard to use! > * Formality: The formatting conventions must be formally > specified. A formal specification allows different tools to > interpret formatted documentation strings consistantly, and > allows for "competing, interoperable implementations," as > specified in PEP 1 [5]. Yes, this is important. But when we choose plaintext, we don't need much of a formal specification! > * Expressive Power: The formatting conventions must have > enough expressive power to allow users to write the API > documentation for any python object. I've never found that plaintext got in the way of my expressiveness. > The following "secondary design goals" follow directly from the > primary design goals, but are important enough to deserve separate > mention: > > * Simplicity: The formatting conventions should be as simple > as is practical, and there should be minimal interaction > between different aspects of the formatting conventions. > This goal derives from intuitiveness and ease of use. More motherhood. > * Safety: No well-formed formatted documentation string should > result in unexpected formatting. This goal derives from > intuitiveness. This is a good one. ST loses big here! > So the question then is what sort of markup language we should define. > I'd be quite happy to use something like Javadoc uses (but with a more > restricted set of acceptable XML elements), but other people think > that it's too hard to read/write... I though Javadoc was geared too much towards generating HTML; we should not focus too much on HTML. > I'm also curious why you don't like ST-like markups. We've been > putting a fair amount of work into formalizing it & making sure it's > "safe" (e.g., it's an error to say *x**b*c**d*). If we can > successfully do both, would that alleviate some of your concerns > about ST? The proponents of ST (that I've talked to) seem to believe that it's unnecessary to tell the users what the exact rules are. This, plus numerous bugs in the ST implementation and the context in which it is used, continuously bite me. E.g. if a paragraph starts with a word followed by a period, the word is replaced with "1.". If I use "--" anywhere in the first line of a paragraph it is turned into a
...
... style construct. There's no easy way to escape HTML keywords. In general, when you *don't* want something to have its special effect, there's no way to escape it. There's no way that I know of to create a bulleted item consisting of several paragraphs. The reliance on indentation levels to detect various levels of headings never works for me. > Any info you can give on what you would like to see come out > of this project (or pointers to info) would be most appreciated. A lot of effort has gone into creating a body of documentation for the standard library *outside* the source code. It is rich in mark-up, indexed, contains many examples, well maintained, and is generally considered high quality documentation. I don't want to have to redo the work that went into creating this. It should be easier to combine code and documentation for 3rd party modules, and there should be an easier way to integrate such documentation into (a version of) the standard documentation. But I disagree with the viewpoint that documentation should be maintained in the same file as source code. (If you believe the argument that it is easier to ensure that it stays up to date, think again. This never worked for comments.) --Guido van Rossum (home page: http://www.python.org/~guido/) From Edward Welbourne Sat Mar 24 16:08:21 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 16:08:21 +0000 (GMT) Subject: [Doc-SIG] backslashing In-Reply-To: <001b01c0b2bf$d02b7320$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <001b01c0b2bf$d02b7320$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: > You really should read the doctest documentation (see the chapter in the > 2.1 docs for the best intro) - it *will* test broken examples as well erm. You've missed what I was trying to say. I was considering the case of a piece of code which isn't consistent with what's actually implemented, given in order to explain *why* it's implemented the way it *is* rather than the way someone might think it *should be*; the illustrated code is showing what would happen if things were done the way the alleged *should be* would force them to be done. If the doctest tool can manage to run the illustrative fragment against an imaginary implementation, we can all retire and leave the industry to it - it's AI complete already. > But as you've presented it, that wouldn't naturally be presented as an > interatice session at all - one wouldn't write it as:: > > for example: > > >>> def __repr__(self): > and so on > > but rather as:: > > for example:: > > def __repr__(self): > and so on > > That's *why* the chosen "start of Python paragraph" thing is '>>>' - > because it *is* what it looks like. again, you've missed my point. I was in no way suggesting that my fragment be treated as part of an interactive session; I was, indeed, bemoaning the fact that if I try to supply that fragment, I must supply it wrongly marked up, in one of the two ways illustrated in your response: either as a test case, which is wrong, or as a verbatim block of *alien text*, which is wrong. At least, assuming '::' introduces a verbatim block of alien text. If it introduces a python verbatim block, then I need to know how to insert an alien verbatim block, because I believe I should be able to distinguish the two kinds of verbatim block. Just to be clear: the test-case markup mechanism is a totally different beast from the four varieties of `literal' I was describing, which come in four varieties because there are two independent binary choices: * is the fragment python or alien * do we inline it or is it a block We have #python.code('inlines')#, we have 'alien inlines' but we have only one variety of verbatim block (assuming you'll let me ignore the test-case inline which isn't any of the cases being considered). Eddy. From Edward Welbourne Sat Mar 24 15:49:30 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 15:49:30 +0000 (GMT) Subject: [Doc-SIG] Tokens for labels & endnotes In-Reply-To: <001a01c0b2be$67095890$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <001a01c0b2be$67095890$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: erm ... > For labels I want to exclude '-_', but yes, for labels I want to > include them. the second use of `labels' was meant to be `endnotes' or citations ? Eddy. From Edward Welbourne Sat Mar 24 16:27:46 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 16:27:46 +0000 (GMT) Subject: [Doc-SIG] documenting class attributes In-Reply-To: <001d01c0b2c1$5753dd00$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <001d01c0b2c1$5753dd00$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: Interesting. I have one variety of class where I manage to do this, but it's not generally helpful ... in a base class, Lazy, an idiom is introduced where, for any 'name' not ending in an _ or starting with more than one _, a method called _lazy_get_name_ will be used to supply the value of name (the first time it's asked for: it's then stored in __dict__); this has become my standard way to document attributes (because, of course, I make attributes lazy wherever possible). A few attributes get to be specified in __init__'s docs, because it's saying how it'll initialise them from its inputs. But generally, the only way to document attributes is using a descriptive list in the class docstring ... which isn't > ... **adjacent to the entity documented** *and* user visible. but then I'm a bit skeptical about the line where it gets set being the right place to document it, if only because the attribute may get set in any method, so how do I know where to look for this line ... In some sense, `adjacent to the entity' is meaningless for a python object's attributes, the best you can do is `adjacent to a line of code which happens to set it' or similar. (If not, please illustrate.) The other `solution' I've used (in places) is to have the attribute actually be a sophisticated object carrying around a doc string but managing to pretend to be the value we wanted for the attribute; again, not generally workable. The right place for attribute description is in the typedef, and python doesn't make us do those; which only really leaves the class docstring. What's wrong with the class docstring ? Eddy. From Edward Welbourne Sat Mar 24 16:31:45 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 16:31:45 +0000 (GMT) Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103221347.f2MDlgp18142@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103221347.f2MDlgp18142@gradient.cis.upenn.edu> Message-ID: > I still don't see why x*y>z *has* to go in literals, > > Now, we have a bold "y>z ", and a mysterious '*' after has! Clearly I don't see why. The * doesn't have a magic meaning *unless* it appears at a word boundary with space or punctuator the other side of it. So the * in x*y>z isn't magic. Is it ? Likewise, x * y > z (which is what I'm far more likely to type) and, if the author *does* go typing x *y > z they've only themselves to blame, and the means to fix it is easy. Eddy. From Edward Welbourne Sat Mar 24 16:39:02 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 16:39:02 +0000 (GMT) Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <002301c0b2dd$a708eb30$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <002301c0b2dd$a708eb30$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: >> Are we expecting people to *want* to link into a document from >> outside? I can't see ever having any use for that when writing >> API docs... > > I don't have a use for it, myself, directly. erm. The documentation of class Foo overrides a method of Foo's base, explains the difference in Foo's version's docstring, but needs to refer to the bit of the base's implementation in which is explained the hideous and hairy reason why certain bits have to behave the way they do. The base's implementation's docstring gave that portion a named anchor to which derived classes' docstrings could refer. Eddy. From Edward Welbourne Sat Mar 24 17:15:19 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 17:15:19 +0000 (GMT) Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <000101c0b388$f9778e20$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <000101c0b388$f9778e20$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: > As such, docutils takes the "code it and see how it works" approach > (Python as formalism), whilst you're taking the "think about it hard > and see what it should do" approach (more traditional formalism). heh. The IETF approach and the IEEE/ISO one. Eddy. From Edward Welbourne Sat Mar 24 16:53:55 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 16:53:55 +0000 (GMT) Subject: [Doc-SIG] Re: docutils REs In-Reply-To: <002a01c0b383$3c18a670$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <002a01c0b383$3c18a670$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: Edward, then Tony: >> think on it some more. (e.g., should it be ok to have a dash after >> an *emph region*-like this?) > That looks wrong to me - but then you can see how I use dashes in plain > text! oh, be more imaginative. After an explanation of the difficulties (in some context) of Doing The Right Thing when an empty list is supplied for some parameter (say):: However, for *non*-empty lists, none of the above matters. Eddy. From Edward Welbourne Sat Mar 24 17:02:18 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 17:02:18 +0000 (GMT) Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <002b01c0b384$d7153480$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <002b01c0b384$d7153480$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: > Ah, but there's no reason you shouldn't be able to *say **this***, for > instance (it's quite unambiguous). I'm glad you're both only discussing this hypothetically, then, and both don't want to allow *** at all. If `unambiguous' was all it took, ***this*** would be unambiguous, too - this is emphasised *and* strong. > up being too confusing. I don't think it's unreasonable to require > that people *say **this** *. At the very least, it seems much ditto. And I see no sense in which the space makes it easier to read. and I'd argue that in '*this*', each ' is adjacent to a word-boundary, namely the end of the emphasised word this, and likewise that in ***this*** the outer *...* or **...** abuts the ends of the word obtained by colouring the word using the inner one. Eddy. From Edward Welbourne Sat Mar 24 17:07:21 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 17:07:21 +0000 (GMT) Subject: [Doc-SIG] backslashing In-Reply-To: <002c01c0b385$80d1a8a0$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <002c01c0b385$80d1a8a0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: Edward, then Tony: >> we have to have some way of distinguishing python literal blocks from >> vanilla literal blocks (so we'll have 5 different literalish types: >> literals; inlines; literal blocks; doctest blocks; and python literal >> blocks). > > That way lies madness, 'cos what about C code, oh, and maybe some > Haskell is very important, and... No, only python is special. All other literals are aliens and shall not be distinguished. Just as with 'verbatim alien' and #verbatim.python# It *does* matter to distinguish between verbatim python and verbatim alien text, partly because we might want to (after the fashion of doctest) verify that an alleged python verbatim *does* at least parse and partly because the renderer may well wish to make some identifiers in it be hyperlinks; whereas verbatim alien text should be simply echoed (subject to stripping its indentation down to the level of its context, naturally). Eddy. From Edward Welbourne Sat Mar 24 17:41:01 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 17:41:01 +0000 (GMT) Subject: [Doc-SIG] ST and DOM In-Reply-To: <200103231527.f2NFR5p19280@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103231527.f2NFR5p19280@gradient.cis.upenn.edu> Message-ID: > Docstrings:: > > > > ... needs: in some sense (and I'm assuming function's declaration includes parameter specs, which will look a lot like the attributes section of a class; each will be dlist in one guise or another). > I said that ordered list bullets are required.. is that > reasonable? Should they be '#IMPLIED' instead? If we want to let the renderer decide whether to use numbers, letters, etc. I imagine we'll need #IMPLIED but don't know DTDs well enough to be sure. Eddy. From Edward Welbourne Sat Mar 24 17:46:17 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 24 Mar 2001 17:46:17 +0000 (GMT) Subject: [Doc-SIG] ST and DOM In-Reply-To: <001201c0b3b8$9cecc190$f05aa8c0@lslp7o.int.lsl.co.uk> (tony@lsl.co.uk) References: <001201c0b3b8$9cecc190$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: > Technically, it is *not* necessarily the same information you get from > the Python code. e.g. >>> print range.__doc__ range([start,] stop[, step]) -> list of integers Return a list containing an arithmetic progression of integers. range(i, j) returns [i, i+1, i+2, ..., j-1]; start (!) defaults to 0. When step is given, it specifies the increment (or decrement). For example, range(4) returns [0, 1, 2, 3]. The end point is omitted! These are exactly the valid indices for a list of 4 elements. how, after all, would source code manage to express that ? One can't have an optional first argument, yet require the second: one can only do it by faking it in the way one interprets one's arguments. Eddy. From pf@artcom-gmbh.de Sun Mar 25 09:42:47 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Sun, 25 Mar 2001 11:42:47 +0200 (MEST) Subject: [Doc-SIG] Formalizing ST In-Reply-To: <200103241739.f2OHd0p13768@gradient.cis.upenn.edu> from "Edward D. Loper" at "Mar 24, 2001 12:39: 0 pm" Message-ID: Hi, Edward D. Loper schrieb: > Hmm.. So I'm starting to think that EBNF really isn't the best > formalism for capturing global formatting. Hmmmm..... I think I have to disagree. What is global formatting? Did you ever had a look at the Python/Grammar/Grammer file, which is basically EBNF and uses the special Tokens INDENT and DEDENT? I was thinking of something like this for ST: structured_text: (headed_section | colored_text)* colored_text: (colored_text_line NEWLINE | bullet_list | table | ....) colored_text_line: .... bullet_list: bullet_item bullet_item* bullet_item: ('*'|'-'|'o') colored_text_line [NEWLINE indented_colored_text] indented_colored_text: INDENT colored_text+ DEDENT headed_section: headline section_body headline: colored_text_line NEWLINE ('-'|'=')* NEWLINE This is rather sketchy and not well thought out, but you might get the basic idea. Maybe even 'pgen' from the Python distribution can be reused for parsing formalized ST? For simple implementation experiments John Aycocks spark might be an alternative. Look at the tokenize module from the Standard library. I wish I would have the time to do this myself. Unfortunately I have not. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From mal@lemburg.com Sun Mar 25 13:22:34 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 25 Mar 2001 15:22:34 +0200 Subject: [Doc-SIG] documenting class attributes References: <001d01c0b2c1$5753dd00$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <3ABDF11A.447718C5@lemburg.com> Edward Welbourne wrote: > > Interesting. I have one variety of class where I manage to do this, but > it's not generally helpful ... in a base class, Lazy, an idiom is > introduced where, for any 'name' not ending in an _ or starting with > more than one _, a method called _lazy_get_name_ will be used to supply > the value of name (the first time it's asked for: it's then stored in > __dict__); this has become my standard way to document attributes > (because, of course, I make attributes lazy wherever possible). A few > attributes get to be specified in __init__'s docs, because it's saying > how it'll initialise them from its inputs. > > But generally, the only way to document attributes is using a > descriptive list in the class docstring ... which isn't > > ... **adjacent to the entity documented** *and* user visible. > but then I'm a bit skeptical about the line where it gets set being the > right place to document it, if only because the attribute may get set in > any method, so how do I know where to look for this line ... > > In some sense, `adjacent to the entity' is meaningless for a python > object's attributes, the best you can do is `adjacent to a line of code > which happens to set it' or similar. (If not, please illustrate.) > > The other `solution' I've used (in places) is to have the attribute > actually be a sophisticated object carrying around a doc string but > managing to pretend to be the value we wanted for the attribute; again, > not generally workable. > > The right place for attribute description is in the typedef, and python > doesn't make us do those; which only really leaves the class docstring. > > What's wrong with the class docstring ? It doesn't support class inheritance, that is overriding attributes with new meanings does not work and you also have to chance to build a complete list of all interface attributes. PEP 224 tried to address this problem. The good thing about the solution proposed in PEP 224 is that it doesn't break any working code and uses the same intuitive syntax as class doc-strings themselves. Still, it was rejected, so I'm not trying to get that approach into the core anymore. If anybody has a better idea, please speak up... -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Sun Mar 25 13:35:21 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 25 Mar 2001 15:35:21 +0200 Subject: [Doc-SIG] Re: docstring signatures References: <200103241952.f2OJqdp19400@gradient.cis.upenn.edu> Message-ID: <3ABDF419.79566421@lemburg.com> "Edward D. Loper" wrote: > > > Don't spend too much time trying to make StructuredText or some > > variation thereof work. In my experience with systems that use ST > > (like ZWiki), it sucks. There basically are two options I like: > > nicely laid out plain text, or a real markup language like Latex or > > DocBook. ST may suck, but it still provides a good compromise between readable source code level documentation and a machine parseable format. Just for reference, here's part of a javadoc-string with real markup: /** * Constructs a BigDecimal object from a * BigInteger, with scale 0. *
* Constructs a BigDecimal which is the exact decimal * representation of the BigInteger, with a scale of * zero. * The value of the BigDecimal is identical to the value * of the BigInteger. * The parameter must not be null. *
* The BigDecimal will contain only decimal digits, * prefixed with a leading minus sign (hyphen) if the * BigInteger is negative. A leading zero will be * present only if the BigInteger is zero. * * @param bi The BigInteger to be converted. */ Python doc-string should maintain the same level of elegance as the rest of the language, IMHO. > > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > Hm. I guess I should have thought to ask the BDFL about all this > before now. :) Makes me wonder if he'll like/accpet *any* of the > stuff we've been talking about. But it's interesting to hear > that Guido is ok with a real markup language. > > So are there any vocal opponents of using a real markup language > on doc-sig right now? (Assuming that Guido doesn't want us > to use something like ST).. Here's one ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From Edward Welbourne Sun Mar 25 10:04:13 2001 From: Edward Welbourne (Edward Welbourne) Date: Sun, 25 Mar 2001 11:04:13 +0100 (BST) Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: <200103241952.f2OJqdp19400@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103241952.f2OJqdp19400@gradient.cis.upenn.edu> Message-ID: > So are there any vocal opponents of using a real markup language on > doc-sig right now? (Assuming that Guido doesn't want us to use > something like ST).. Yes (on that assumption). As Guido said: > ... There basically are two options I like: nicely laid out plain > text, or a real markup language like Latex or DocBook. and the first option is blatantly the correct one for doc strings (as in: you won't see enough folk using LaTeX systematically and you won't see *anyone* using DocBook). Oh, I think I'm meant to say `IMO' at about this point. A while ago Guido was reasonably emphatic against HTML (in doc-strings). I suspect I'd written more HTML-based docstrings by then than everyone else put together, and when I turned them into something resembling a proto-ST, I was happier with the result and glad that Guido had rejected HTML. (Sorry, Tibs, I might not have converted all of them ...) > Of course, on the other hand, if we can clean ST up enough, and > make it formal, maybe he'll be ok with it. Yes. While I'm still giggling (largely due to released stress) about Guido's magnificent intervention, it doesn't explicitly rule out `were ST a real markup language it would be OK', only you *really* want to talk to Guido about what `real markup language' means in this context. Clearly he'll allow that indentation-based structure is real structure (since python *is* a real programming language), but equally clearly he's not enamoured of the ST family. This *might* just be because they're all so damnably ad hoc, in which case your clean-up project *might* be a winner; but please have a chat with Guido some how. What *are* his objections to the ST family ? Edward [tweaked] by Eddy in the process of echoing: > I'm also thinking of putting together a "minimal" ST-like language, > that would include markup for: > * lists [three flavours ?] > * emph [presumably no strong] > * urls (using '<>' delimiters) > * [inline] literals (one type, probably using '#' as delimiters) > * literal blocks [presumably one type, again] Some of us would use it if it were * very simple (you're doing fairly well above) * so nearly plain text that just printing it verbatim would work fine but then at least one of us is of debatable sanity ;^> Something in the spirit of ST but done properly would have a better chance than something striving to be ST without its warts, IMO. > But maybe we'd be better off just using XML.. :) IIRC, Guido's reasons against HTML in doc strings will take out XML also. But ask him. > or something like javadoc ('@param(x) foo..', etc.).. OK, ignorance speaks: what's javadoc like, could it be classed as a `real markup language', are there compatibility issues (like it depending on the code it decorates being in a language which uses punctuation, rather than indentation, to delimit structure), where's a good URL for an idiot introduction, why aren't we using it already ? Don't feel that only Edward is allowed to answer those, folks ;^> I suspect he'll say `don't know' to at least the last, and several of you will give better answers than him on the rest. Clearly a markup language specified by someone we can't persuade to change has one humungous advantage: I'd never again spend an entire month's free time fretting about whether some proposed changes were a good idea and how to make them better. If, say, we used javadoc we'd just be stuck with whatever Sun have specified, so even if we don't like some bits of it we'd just knuckle down and get over it. There may be better things for us to channel our energy towards ... especially if no cousin of ST is ever going to find favour with Guido. Eddy. -- Not that I'm grumbling, just knackered, at least largely for other reasons. From edloper@gradient.cis.upenn.edu Sun Mar 25 15:45:40 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 25 Mar 2001 10:45:40 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Sat, 24 Mar 2001 16:31:45 GMT." Message-ID: <200103251545.f2PFjep11824@gradient.cis.upenn.edu> > I don't see why. The * doesn't have a magic meaning *unless* it appears > at a word boundary with space or punctuator the other side of it. So > the * in x*y>z isn't magic. Is it ? Well, that's not quite the environment that Tibs was checking for.. According to the STpy regexps, you *just* need a space to the left/right of the '*'.. so you can say * big * to mean *big*. It might be that we'd want to change that. But there's an argument to be made for having the environments in which emph can start/end be the same as the environments that literal can start/end in. So the question then is whether we want to allow things like ' x '. And actually, thinking about it now, I don't see why we would want to.. So maybe we *should* change to your rules.. something like: \s (?P [*] # open delimiter (?! [\s\n]) # first char can't be sp [^*]* # contents [^*\s\n] # last char can't be sp [*]) # close delimiter Or some cleaner version of that.. (I used the '(?!' so you can have emph regions with only 1 char in them.) Another idea I've been toying with (in my more restricted version of ST) is to *only* allow a *single* word to be emphasized. If you want to emphasize multiple words you have to *do* *it* *like* *this*. That seems much safer/more local/etc.. And I can't think of the last time I tried to emphasize more than 2 words at once anyway. *It just looks weird, and is hard to read, if you try to emphasize a big region.* -Edward From edloper@gradient.cis.upenn.edu Sun Mar 25 15:49:13 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 25 Mar 2001 10:49:13 EST Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: Your message of "Sat, 24 Mar 2001 17:02:18 GMT." Message-ID: <200103251549.f2PFnEp11969@gradient.cis.upenn.edu> > I'm glad you're both only discussing this hypothetically, then, and both > don't want to allow *** at all. Um. *I* don't want to allow '***' at all, but I think Tibs does. -Edward From edloper@gradient.cis.upenn.edu Sun Mar 25 15:55:59 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 25 Mar 2001 10:55:59 EST Subject: [Doc-SIG] ST and DOM In-Reply-To: Your message of "Sat, 24 Mar 2001 17:46:17 GMT." Message-ID: <200103251555.f2PFtxp12310@gradient.cis.upenn.edu> > > Technically, it is *not* necessarily the same information you get from > > the Python code. > > e.g. > > >>> print range.__doc__ > range([start,] stop[, step]) -> list of integers Yeah.. that doc string always bothered me. Makes me think I'm using some language other than Python. :) When I first looked at that signature, I really couldn't tell whether calling it with 2 parameters would give it a stop & a step, or a start & a step... And it seems like you should be able to at *least* tell that type of info from the signature. But, that said, I see your point. -Edward From edloper@gradient.cis.upenn.edu Sun Mar 25 16:02:39 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 25 Mar 2001 11:02:39 EST Subject: [Doc-SIG] Re: Doc-SIG digest, Vol 1 #301 - 11 msgs In-Reply-To: Your message of "Sat, 24 Mar 2001 23:29:02 EST." Message-ID: <200103251602.f2PG2dp12575@gradient.cis.upenn.edu> > >> Are we expecting people to *want* to link into a document from > >> outside? I can't see ever having any use for that when writing > >> API docs... > > > > I don't have a use for it, myself, directly. > > erm. The documentation of class Foo overrides a method of Foo's base, > explains the difference in Foo's version's docstring, but needs to refer > to the bit of the base's implementation in which is explained the > hideous and hairy reason why certain bits have to behave the way they > do. The base's implementation's docstring gave that portion a named > anchor to which derived classes' docstrings could refer. It seems to me that this is asking for broken pointers, etc., within our docs, the next time someone updates the base implementation's docs.. API doc strings really shouldn't be that long anyway, so I don't feel so bad about referring someone to an entire docstring.. -Edward From Edward Welbourne Sun Mar 25 12:57:44 2001 From: Edward Welbourne (Edward Welbourne) Date: Sun, 25 Mar 2001 13:57:44 +0100 (BST) Subject: [Doc-SIG] documenting class attributes In-Reply-To: <3ABDF11A.447718C5@lemburg.com> (mal@lemburg.com) References: <001d01c0b2c1$5753dd00$f05aa8c0@lslp7o.int.lsl.co.uk> <3ABDF11A.447718C5@lemburg.com> Message-ID: >> What's wrong with the class docstring ? > It doesn't support class inheritance, that is overriding attributes > with new meanings does not work and you also have to chance to > build a complete list of all interface attributes. but, if Tony and Edward manage to formalise the structure of the class docstring enough that it has an attributes section, tools can auto-trawl base classes to achieve these desiderata. Each class effectively supplies a mapping from names of attributes it defines to attribute docstrings (extracted from the class' description of the attribute); judicious munging and mangling should then suffice to build up, for each class, a mapping from names of attributes it has (whether it defines them, redefines them or just inherits them) to their docstrings. Job Done. > If anybody has a better idea, please speak up... will the above do ? If not, why not ? I mean, aside from depending on a suitably well-formalised and widely used `Attributes:' section in the class doc-string, and the possibility of the whole ST project being torpedoed by Guido ... Eddy. From tony@lsl.co.uk Mon Mar 26 09:14:21 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 26 Mar 2001 10:14:21 +0100 Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103232342.f2NNgvp27539@gradient.cis.upenn.edu> Message-ID: <001f01c0b5d5$247ce400$f05aa8c0@lslp7o.int.lsl.co.uk> Gosh, it's been a busy weekend. Let's assume, for the moment, that this dicussion is still worth pursuing! Edward D. Loper wrote: > Fine by me, as long as we explicitly say that all spaces in text > (not in literals) are soft. It seems like the parser *should* > reduce sequences of multiple spaces, but I'll live if it doesn't > (c.f., XML parsers are required to reduce sequences of multiple > spaces in attribute strings like this: ''). I'm actually happy either way - I think STpy (in draft) currently says that trailing spaces may be lost and that spaces in (not literal) text may be conflated, which leaves it open. I would easily be convinced that those "may"s should be "shall"s... (thinks for 30 seconds) - OK, I shall make it so. Spaces in non-literal text shall be "reduced". They are already "soft". > [Tibs discusses ***] > > Ok. So, on further thought, *** can be given consistant meaning > (assuming a left-to-right-style parsing): > > CURRENT CONDITION | Meaning > Emph? | Strong? | > -------+----------+------------------- > no | no | start both strong & emph > no | yes | end strong, start emph > yes | no | end emph, start strong > yes | yes | end both strong & emph > > If you do give '***', that is the meaning it should recieve. Note > that '****' shouldn't ever really have a meaning. OK. > I guess I'll just have to wait for your nested-coloring regexps. :) > (But I still think that '***' is potentially confusing to readers, > and that's a Bad Thing). This is difficult for me, as I will type it with little-or-no thought (which we all know, of course, is not the same as reading it easily!). I think this is a "debate it later" topic (assuming we *have* a later). (of course, Eddy may be unhappy, since he said: > I'm glad you're both only discussing this hypothetically, then, > and both don't want to allow *** at all. If `unambiguous' was > all it took, ***this*** would be unambiguous, too - this is > emphasised *and* strong. because I can't see what is *wrong* with '***this***' myself - but I note that Edward agrees with Eddy on objecting to it.) Meanwhile, Edward Loper continues to worrit at the "quoted text for emphasis" problem: > Another idea I've been toying with (in my more restricted version > of ST) is to *only* allow a *single* word to be emphasized. If > you want to emphasize multiple words you have to *do* *it* *like* > *this*. That seems much safer/more local/etc.. And I can't > think of the last time I tried to emphasize more than 2 words > at once anyway. *It just looks weird, and is hard to read, if > you try to emphasize a big region.* I agree that it is difficult to "see" a large text emphasised, but I also don't think I would be happy having to emphasis individual words in that style. I believe I *do* emphasise more than one word on occasions, but don't actually *know* (and certainly one word cases *do* predominate). Right - that's that thread, on to the next Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 26 09:19:00 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 26 Mar 2001 10:19:00 +0100 Subject: [Doc-SIG] anchors and local references In-Reply-To: <200103240344.f2O3i3p09660@gradient.cis.upenn.edu> Message-ID: <002001c0b5d5$cb403c60$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper provided a summary with which I, for one, agree (can I steal the text, please?!) and then said, in answer to Eddy's: > > Hmm. Well, ideally we'd support standard citation forms, > which would > > include '[this, that, other]', to be treated like '[this], [that], > > [other]' > > Fine with me, if others also want it. Of course, I also wouldn't > feel bad about making people type [this], [that], [other]. I also agree that the comma seperated form is nice, but for the moment would prefer to leave it alone (just *too many things* to do) (heh - I emphasised more than word, Edward!). > > One might plausibly want to allow '&' in ref names > > I agree. Me too, now it's been pointed out - I'll go with Edward's summary's list of valid characters, I think. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 26 09:24:13 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 26 Mar 2001 10:24:13 +0100 Subject: [Doc-SIG] backslashing In-Reply-To: Message-ID: <002101c0b5d6$859de440$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Welbourne wrote: > Just to be clear: the test-case markup mechanism is a totally > different beast from the four varieties of `literal' I was > describing, which come > in four varieties because there are two independent binary choices: > * is the fragment python or alien > * do we inline it or is it a block > We have #python.code('inlines')#, we have 'alien inlines' but we have > only one variety of verbatim block (assuming you'll let me ignore the > test-case inline which isn't any of the cases being considered). Then, in those terms, yes. Not for any deep philosophical reasons, but just because doctest exists. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From mal@lemburg.com Mon Mar 26 09:26:05 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 26 Mar 2001 11:26:05 +0200 Subject: [Doc-SIG] documenting class attributes References: <001d01c0b2c1$5753dd00$f05aa8c0@lslp7o.int.lsl.co.uk> <3ABDF11A.447718C5@lemburg.com> Message-ID: <3ABF0B2D.3AB8FD33@lemburg.com> Edward Welbourne wrote: > > >> What's wrong with the class docstring ? > > > It doesn't support class inheritance, that is overriding attributes > > with new meanings does not work and you also have to chance to > > build a complete list of all interface attributes. > > but, if Tony and Edward manage to formalise the structure of the class > docstring enough that it has an attributes section, tools can auto-trawl > base classes to achieve these desiderata. Each class effectively > supplies a mapping from names of attributes it defines to attribute > docstrings (extracted from the class' description of the attribute); > judicious munging and mangling should then suffice to build up, for each > class, a mapping from names of attributes it has (whether it defines > them, redefines them or just inherits them) to their docstrings. > Job Done. Some issues: 1. the documentation is separated from the attribute definition -- minor issue, but still important (methods are not documented in the class doc-string either) 2. you'll have to reformat all class doc-strings to make the new feature available (I think common use is to simply add hash comments just before or after the attribute definitions -- these should be easily reusable) 3. we'd need a runtime tool to extract the information from the class doc-string > > If anybody has a better idea, please speak up... > will the above do ? > If not, why not ? > > I mean, aside from depending on a suitably well-formalised and widely > used `Attributes:' section in the class doc-string, and the possibility > of the whole ST project being torpedoed by Guido ... I wasn't under the impression that Guido was trying to torpedo the ST approach. Where did you get that idea ? -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From tony@lsl.co.uk Mon Mar 26 10:14:51 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 26 Mar 2001 11:14:51 +0100 Subject: [Doc-SIG] Re: docstring signatures (answer to Guido) In-Reply-To: <200103242037.PAA28743@cj20424-a.reston1.va.home.com> Message-ID: <002201c0b5dd$985856e0$f05aa8c0@lslp7o.int.lsl.co.uk> Oh dear. Well, thanks to Edward for finally getting the BDFL's opinion on the "top of the callable docstring" thing - wish I could find the reference where it was claimed to be needed. Maybe I was remembering a dream, or living in a parallel world, or something. I'll happily drop them from the spec (and stop using them!). I *am* a bit disturbed, though, as to whether Guido has decided against an ST approach *in all circumstances* (I'm sure he was less negative on previous rounds of the Doc-SIG, but I'm not entirely prepared to trust my memory at this stage). I do know that *I* want some "light formatting" solution, and so have other participants of the SIG - and given this is Python-land, I don't believe that's because we're trying to "nanny" people, I think it's because we want it *for ourselves*. I hope the rest of this email doesn't come across as ranting against Guido. But I *do* feel he's being a little bit unfair... In response to Edward Loper, Guido wrote: > Not true. Most of the standard library uses the same convention, and > even if it's not quite written down, it wouldn't be hard to figure out > what it is. Hmm. But this is where we *came* from, initially, surely - an attempt to figure out what people *actually* write down. Asking people to conform to a convention that is *not* evident explicitly somewhere is, well, a bit unfair (I include me in "people" here, by the way). > > As a result, it is very difficult to write widely-applicable > > tools for processing documentation strings. > > Again not true. Ping's pydoc does quite well second-guessing the > existing conventions. This has been a great source of argument on Doc-SIG in the past - "quite well" is not the goal that some of us wanted. But it's still not the only reason why many of us want ST - we actually want to have some markup in the text for all sorts of reasons. > > * Safety: No well-formed formatted documentation > > string should result in unexpected formatting. > > This is a good one. ST loses big here! I *do* feel that it is a *leetle bit* (excuse the sarcasm) unfair to judge the STpy and STminus works on the basis of a tool/specification that they are not. As far as I can tell, STClassic (the implementation) is *not* a very good example of how to do it (and that's meant to be english understatement). And that seems to be what Guido is basing this statement on. > The proponents of ST (that I've talked to) seem to believe that it's > unnecessary to tell the users what the exact rules are. Yes, but that wasn't *us* - we're proponents of (a form of) ST as well. But obviously just not *those* proponents. > This, plus numerous bugs in the ST implementation and the context > in which it is used, continuously bite me. Again, it's surely a bit unfair to say (as this does) "an implementation of an ancestor specification sucked, so what you're doing does as well". > E.g. if a paragraph starts with a word > followed by a period, the word is replaced with "1.". I agree that's loony. But it's not what is being proposed. > If I use "--" anywhere in the first line of a paragraph > it is turned into a
...
... style construct. Well, ' -- ' in our version - predicated surely on the idea that most people don't use double hyphens in plain text (which I happen to believe as well), whereas the: something -- some text about it style is fairly easy to spot. > There's no easy way to escape HTML keywords. A problem of *that* specification, not of STpy or STminus (and *aggressively* not so). We do *not* weld ourselves to HTML as an output format, nor indeed XML, and thus '<' and '>' are not treated specially at all. > In general, when you *don't* want something to have its > special effect, there's no way to escape it. A problem, agreed - but we've actively been worrying about this, and looking for *specific cases* where this causes a problem, to see if we can work around it. I'd be very interested to know which cases cause Guido problems (and if they're artefacts of the earlier specifications, or something we can use as examples of problems for ourselves). > There's no way that I know of to create a bulleted item > consisting of several paragraphs. This is a lunacy of the implementation Guido's been using, I would say. > The reliance on indentation levels to detect various levels of > headings never works for me. Well, I don't like it either. It won't stop me writing a PEP, though, which does the same thing (and is, of course, pretty close to being written in STpy/STminus). For what it's worth, there *are* proposals to fix that (the section = indentation thingy), but they're not worth pursuing until we have something available to talk about, which is what we've been trying to do. > > Any info you can give on what you would like to see come out > > of this project (or pointers to info) would be most appreciated. > > A lot of effort has gone into creating a body of documentation for the > standard library *outside* the source code. It is rich in mark-up, > indexed, contains many examples, well maintained, and is generally > considered high quality documentation. I don't want to have to redo > the work that went into creating this. Of course not. We're not attempting to change that (at least Edward and I are not). > It should be easier to combine code and documentation for 3rd party > modules, and there should be an easier way to integrate such > documentation into (a version of) the standard documentation. But I > disagree with the viewpoint that documentation should be maintained in > the same file as source code. (If you believe the argument that it > is easier to ensure that it stays up to date, think again. This never > worked for comments.) I'm afraid you (Guido) are conflating two different arguments. The argument by Ka-Ping Yee that the *whole* documentation for a module should live in the file for that module is (a) a different thread, and (b) one I argue strongly against. I hope that Guido isn't already decided on this issue, once and for all. The consensus of the Doc-SIG, over the years (many of whom have been people who Guido knows and has much more reason to respect the opinion of than me) has been that we need a way of formatting docstrings, and that it should be an etext derivative. I believe that we (on the Doc-SIG) have been producing a *better* etext derivative, whilst still trying to stay at least partially compatible with the sibling STNG project. *Should* we have been more radical and just broken with STNG entirely? It would have made life somewhat simpler... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Mar 26 10:14:53 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 26 Mar 2001 11:14:53 +0100 Subject: [Doc-SIG] Re: docstring signatures (small implementation) In-Reply-To: <200103241952.f2OJqdp19400@gradient.cis.upenn.edu> Message-ID: <002301c0b5dd$99a26310$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > So are there any vocal opponents of using a real markup language > on doc-sig right now? (Assuming that Guido doesn't want us > to use something like ST).. I suspect they will have self-deselected. Actually, I don't remember seeing anyone against markup in Doc-SIG - people who want *heavyweight* markup, yes, but not people who want none. > Of course, on the other hand, if we can clean ST up enough, and > make it formal, maybe he'll be ok with it. I hope so - I am a bit worried. I still want it for myself, of course (and so have a lot of other people) - so we may just need to "rally the troops". > I'm going to put my PEP on hold for now, until we figure this stuff > out.. (if anyone wants to see what I've written so far, though, > I'll be happy to send you a copy -- just email me). I'd like a copy, of course (!) > I'm also thinking of putting together a "minimal" ST-like language, > that would include markup for: > * lists > * emph > * literals (one type, probably using '#' as delimiters) > * urls (using '<>' delimiters) > * literal blocks I think that would be a good thing. > But maybe we'd be better off just using XML.. :) or something like > javadoc ('@param(x) foo..', etc.).. We *went round* this loop at least twice before, and it doesn't fly. People won't do it. Eddy says: > Something in the spirit of ST but done properly would have a better > chance than something striving to be ST without its warts, IMO. I must admit I would have been happier in many ways if we could drop some of the inheritance from STClassic (and compatibility intents with STNG). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From gward@mems-exchange.org Mon Mar 26 14:26:26 2001 From: gward@mems-exchange.org (Greg Ward) Date: Mon, 26 Mar 2001 09:26:26 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: <200103241952.f2OJqdp19400@gradient.cis.upenn.edu>; from edloper@gradient.cis.upenn.edu on Sat, Mar 24, 2001 at 02:52:39PM -0500 References: <200103241901.OAA27542@cj20424-a.reston1.va.home.com> <200103241952.f2OJqdp19400@gradient.cis.upenn.edu> Message-ID: <20010326092625.C10678@mems-exchange.org> On 24 March 2001, Edward D. Loper said: > Hm. I guess I should have thought to ask the BDFL about all this > before now. :) Makes me wonder if he'll like/accpet *any* of the > stuff we've been talking about. But it's interesting to hear > that Guido is ok with a real markup language. > > So are there any vocal opponents of using a real markup language > on doc-sig right now? (Assuming that Guido doesn't want us > to use something like ST).. I've been lurking throughout this whole thread (and occasionally leaning on the "D" key... sorry guys... ;-), mainly because it sounds like you're on the right track but you're doing the boring plodwork. Thank you, keep it up, etc. etc. However, I would just like to state for the record that I am not -0, or -1, but more like -1e6 on putting a "real" markup language in docstrings, assuming that the set of "real" markup languages is limited to {Tex-like languages, SGML-like languages}. I consider both to be misbegotten freaks that completely ignore the human factors of writeability and readability. > Of course, on the other hand, if we can clean ST up enough, and > make it formal, maybe he'll be ok with it. > > I'm going to put my PEP on hold for now, until we figure this stuff > out.. (if anyone wants to see what I've written so far, though, > I'll be happy to send you a copy -- just email me). > > I'm also thinking of putting together a "minimal" ST-like language, > that would include markup for: > * lists > * emph > * literals (one type, probably using '#' as delimiters) > * urls (using '<>' delimiters) > * literal blocks Larry Wall has been there and done that: "man perlpod" if you're on a properly administered Unix system. ;-) POD is really easy to write, and pretty easy to read (human) and parse (software). The high-level POD syntax (where /\n\n=[A-Z]+ .*\n\n/ denotes a section delimiter) is closely tied to Perl and irrelevant to Python, since Python already has a way of saying "this text is documentation for module/class/function 'foo'". But the within-paragraph markup convention -- "this is a C, this is B" is pretty easy and useful. Like ST, it could stand a bit of formalization, although that has improved greatly in recent years with Brad Appleton's Pod::Parser family of modules. Althought I've never used ST, my understanding is that ST and POD are pretty semantically similar, and with very similar goals: easy-to-write, easy-to-read, minimal markup that's "good enough" for generating man pages HTML documents, and plain text. They both suffer from lack of formalization, although I think POD is better nowadays. > But maybe we'd be better off just using XML.. :) or something like > javadoc ('@param(x) foo..', etc.).. If I ever see XML in a Python docstring, I think I'll go running back to Perl. ;-) XML is many things to many people, but it most certainly is not fit for human consumption. OTOH, I quite like Javadoc's "@param", "@return" syntax. It's easy to write, easy to read, easy to parse, and just formal enough so that doc tools can make sense of it. It might be more Pythonic to spell those "param:", "returns:", though. ;-) Greg From tony@lsl.co.uk Mon Mar 26 14:26:45 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 26 Mar 2001 15:26:45 +0100 Subject: [Doc-SIG] Re: docstring signatures (and my memory) In-Reply-To: <002201c0b5dd$985856e0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <002701c0b600$c8e303f0$f05aa8c0@lslp7o.int.lsl.co.uk> I wrote: > Oh dear. Well, thanks to Edward for finally getting the BDFL's opinion > on the "top of the callable docstring" thing - wish I could find the > reference where it was claimed to be needed. Unfortunately, it's not trivial to search the doc-sig archives, but a download and a grep later, I find: > Date: Sun, 28 Nov 1999 16:57:03 -0800 (Pacific Standard Time) > From: David Ascher > To: doc-sig@python.org > Subject: [Doc-SIG] docstring grammar > > For compatibility with Guido, IDLE and Pythonwin (and increasing the > likelihood that the proposal will be accepted by GvR), the > docstrings of callables must follow the following convention > established in Python's builtins: > > >>> print len.__doc__ > len(object) -> integer ...rest of explanation omitted... Which was written last time round the Doc-SIG loop. So it wasn't me that was channeling Guido wrongly, after all. (it's very strange going back in Doc-SIG history so deeply - tempting to browse for too long...) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From gward@mems-exchange.org Mon Mar 26 14:39:06 2001 From: gward@mems-exchange.org (Greg Ward) Date: Mon, 26 Mar 2001 09:39:06 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: <200103242037.PAA28743@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Sat, Mar 24, 2001 at 03:37:52PM -0500 References: <200103242007.f2OK7cp20052@gradient.cis.upenn.edu> <200103242037.PAA28743@cj20424-a.reston1.va.home.com> Message-ID: <20010326093905.D10678@mems-exchange.org> On 24 March 2001, Guido van Rossum said: > It should be easier to combine code and documentation for 3rd party > modules, and there should be an easier way to integrate such > documentation into (a version of) the standard documentation. But I > disagree with the viewpoint that documentation should be maintained in > the same file as source code. (If you believe the argument that it > is easier to ensure that it stays up to date, think again. This never > worked for comments.) >From direct personal experience (15 or so Perl modules, some on CPAN and some not, comprising ~10k LoC), I *do* think that intermingling code and documentation makes it easier to update them together. Note that it does not make it *absolutely* easy or painless or automatic; any of those are bogus arguments. But having code and doccs in the same file definitely makes life *less* painful. Greg From tony@lsl.co.uk Mon Mar 26 14:53:17 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 26 Mar 2001 15:53:17 +0100 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: <20010326092625.C10678@mems-exchange.org> Message-ID: <002a01c0b604$7dd26a00$f05aa8c0@lslp7o.int.lsl.co.uk> Greg Ward wrote: > I've been lurking throughout this whole thread (and > occasionally leaning on the "D" key... sorry guys... ;-) no, sounds sensible to me - I hate to think what it must "sound" like to anyone else listening to our, erm, interchanges. > However, I would just like to state for the record that I am > not -0, or -1, but more like -1e6 on putting a "real" markup > language in docstrings, assuming that the set of "real" markup > languages is limited to {Tex-like languages, SGML-like languages}. Well, I think that's taken to be the general meaning of "real" in this sort of context. These days, in this forum, I prefer the term "heavyweight". > I consider both to be misbegotten freaks > that completely ignore the human factors > of writeability and readability. Ah - but they are "misbegotten freaks" that *deliberately* ignore the human factors of etc. TeX because it was originally aimed at people with great motivation to use it for its original purposes (and when there was no alternative), but not *really* for "human beings" in the aggregate. SGML/XML/etc because they're not meant for humans to read/write. Despite the fact some of us do. Personally, I used to be +10 for formal markup, but am now (somewhat reluctantly) -1 against it (see, my response got more moderate!). > Larry Wall has been there and done that: "man perlpod" if you're on a > properly administered Unix system. ;-) Hmm. Actually, our sysadmin really likes perl. And he's a friend. > POD is really easy to write, and > pretty easy to read (human) and parse (software). Hmm. Personally I think it has all the disadvantages for reading that things like XML do, but with none of the advantages of *being* something like XML. Of course, reactions differ. > Althought I've never used ST, my understanding is that ST and > POD are pretty semantically similar, and with very similar goals: > easy-to-write, easy-to-read, minimal markup that's "good enough" for > generating man pages, HTML documents, and plain text. Probably so - they're doubtless as similar as Perl and Python (which is quite similar, of course, compared to many other languages). > They both suffer from lack of > formalization, although I think POD is better nowadays. And if Edward Loper finishes his task (heh, even if I finish mine) will be a lot better for STpy too. > XML is many things to many people, but it most > certainly is not fit for human consumption. It would be amazing if it were - I'm currently working with (well, thinking about, some of the time) *quite large* XML documents (well, actually, they represent geographic data) and one would be surprised if a human ever tried to look at it. > OTOH, I quite like Javadoc's "@param", "@return" syntax. > It's easy to write, easy to read, easy to parse, and just > formal enough so that doc tools can make sense of it. > It might be more Pythonic to spell those "param:", > "returns:", though. ;-) Erm, "Arguments:" and "Returns:" (the last is an "I think", 'cos I don't tend to use it) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From gward@mems-exchange.org Mon Mar 26 15:16:24 2001 From: gward@mems-exchange.org (Greg Ward) Date: Mon, 26 Mar 2001 10:16:24 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: <002a01c0b604$7dd26a00$f05aa8c0@lslp7o.int.lsl.co.uk>; from tony@lsl.co.uk on Mon, Mar 26, 2001 at 03:53:17PM +0100 References: <20010326092625.C10678@mems-exchange.org> <002a01c0b604$7dd26a00$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <20010326101624.B10802@mems-exchange.org> On 26 March 2001, Tony J Ibbs (Tibs) said: > > POD is really easy to write, and > > pretty easy to read (human) and parse (software). > > Hmm. Personally I think it has all the disadvantages for reading that > things like XML do, but with none of the advantages of *being* something > like XML. Of course, reactions differ. The differences are subtle, but they're enough: in POD, tags are shorter (one character), only used intra-paragraph (ie. there aren't tags for sections or indented code chunks -- that's more-or-less implicit), and there's far less tendency to nest them. That makes a *huge* difference for human readability/writeability. > Probably so - they're doubtless as similar as Perl and Python (which is > quite similar, of course, compared to many other languages). Yup. In my fairly uninformed opinion, it seems like the main differene between POD and ST is spelling. One uses B and C, the other uses *bold* and 'code'. I prefer POD's slightly more in-your-face and less ambiguous markup, but that's mainly because I have experience with it and I know I like it. I'm sure I could come to like ST in time, too. ;-) > > OTOH, I quite like Javadoc's "@param", "@return" syntax. > > It's easy to write, easy to read, easy to parse, and just > > formal enough so that doc tools can make sense of it. > > It might be more Pythonic to spell those "param:", > > "returns:", though. ;-) > > Erm, "Arguments:" and "Returns:" (the last is an "I think", 'cos I don't > tend to use it) More trivial spelling differences. I don't much care how it's spelled, but I like the idea of a tiny bit of formal markup to say, "this is a function return value", "this is a function argument", "this is an instance attribute", "this is a class attribute", etc. Trailing colon is definitely more Pythonic than leading "@", though! Greg From guido@digicool.com Mon Mar 26 15:31:02 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 26 Mar 2001 10:31:02 -0500 Subject: [Doc-SIG] Re: docstring signatures (answer to Guido) In-Reply-To: Your message of "Mon, 26 Mar 2001 11:14:51 +0100." <002201c0b5dd$985856e0$f05aa8c0@lslp7o.int.lsl.co.uk> References: <002201c0b5dd$985856e0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103261531.KAA06371@cj20424-a.reston1.va.home.com> > Oh dear. Well, thanks to Edward for finally getting the BDFL's opinion > on the "top of the callable docstring" thing - wish I could find the > reference where it was claimed to be needed. Maybe I was remembering a > dream, or living in a parallel world, or something. I'll happily drop > them from the spec (and stop using them!). You probably saw the docstrings on some *extension* modules, where the signature is generally included. > I *am* a bit disturbed, though, as to whether Guido has decided against > an ST approach *in all circumstances* (I'm sure he was less negative on > previous rounds of the Doc-SIG, but I'm not entirely prepared to trust > my memory at this stage). That's because now I have actual experience using ST (in ZWiki). > I do know that *I* want some "light formatting" solution, and so have > other participants of the SIG - and given this is Python-land, I don't > believe that's because we're trying to "nanny" people, I think it's > because we want it *for ourselves*. > > I hope the rest of this email doesn't come across as ranting against > Guido. But I *do* feel he's being a little bit unfair... > > In response to Edward Loper, Guido wrote: > > Not true. Most of the standard library uses the same convention, and > > even if it's not quite written down, it wouldn't be hard to figure out > > what it is. > > Hmm. But this is where we *came* from, initially, surely - an attempt to > figure out what people *actually* write down. Asking people to conform > to a convention that is *not* evident explicitly somewhere is, well, a > bit unfair (I include me in "people" here, by the way). You cut out the part where I pointed out that it *is* explicit -- in the style guidelines, which haven't been challenged. > > > As a result, it is very difficult to write widely-applicable > > > tools for processing documentation strings. > > > > Again not true. Ping's pydoc does quite well second-guessing the > > existing conventions. > > This has been a great source of argument on Doc-SIG in the past - "quite > well" is not the goal that some of us wanted. But it's still not the > only reason why many of us want ST - we actually want to have > some markup in the text for all sorts of reasons. There are only two choices. Either you have markup or you don't. If you design a markup system, it should be complete, and allow full control over the lay-out -- including full control in cases where you *don't* want the special characters to have effect. ST is neither markup nor "not markup", and that's why it fails, in my view. > > > * Safety: No well-formed formatted documentation > > > string should result in unexpected formatting. > > > > This is a good one. ST loses big here! > > I *do* feel that it is a *leetle bit* (excuse the sarcasm) unfair to > judge the STpy and STminus works on the basis of a tool/specification > that they are not. As far as I can tell, STClassic (the implementation) > is *not* a very good example of how to do it (and that's meant to be > english understatement). And that seems to be what Guido is basing this > statement on. Sure. As I'm not a subscriber to this list, I was not aware of those, and nobody has bothered to forward me a pointer to a specification. (I typed them into Google but got no useful hits.) > > The proponents of ST (that I've talked to) seem to believe that it's > > unnecessary to tell the users what the exact rules are. > > Yes, but that wasn't *us* - we're proponents of (a form of) ST as well. > But obviously just not *those* proponents. Then you did a poor job of distinguishing yourself. Thanks for clarifying. > > This, plus numerous bugs in the ST implementation and the context > > in which it is used, continuously bite me. > > Again, it's surely a bit unfair to say (as this does) "an implementation > of an ancestor specification sucked, so what you're doing does as well". Well, you have associated yourself with it by choosing the same moniker. I see that you are trying to dissociate yourself now. OK, I'll give it a shot. But show me the specs please! > > E.g. if a paragraph starts with a word > > followed by a period, the word is replaced with "1.". > > I agree that's loony. But it's not what is being proposed. > > > If I use "--" anywhere in the first line of a paragraph > > it is turned into a
...
... style construct. > > Well, ' -- ' in our version - predicated surely on the idea that most > people don't use double hyphens in plain text (which I happen to believe > as well), whereas the: > > something -- some text about it > > style is fairly easy to spot. Then we disagree -- I use double hyphens in text *all the time* -- and I know I'm not alone. Unless I misunderstand what you propose. > > There's no easy way to escape HTML keywords. > > A problem of *that* specification, not of STpy or STminus (and > *aggressively* not so). We do *not* weld ourselves to HTML as an output > format, nor indeed XML, and thus '<' and '>' are not treated specially > at all. > > > In general, when you *don't* want something to have its > > special effect, there's no way to escape it. > > A problem, agreed - but we've actively been worrying about this, and > looking for *specific cases* where this causes a problem, to see if we > can work around it. I'd be very interested to know which cases cause > Guido problems (and if they're artefacts of the earlier specifications, > or something we can use as examples of problems for ourselves). Without knowing your ruleset I can't know what the problems are, of course. > > There's no way that I know of to create a bulleted item > > consisting of several paragraphs. > > This is a lunacy of the implementation Guido's been using, I would say. I would hope so. > > The reliance on indentation levels to detect various levels of > > headings never works for me. > > Well, I don't like it either. It won't stop me writing a PEP, though, > which does the same thing (and is, of course, pretty close to being > written in STpy/STminus). Ah, but the "pretty close" is exactly what's wrong. PEPs are written in plain text, and there's not enough information to know when to interpret characters as markup and when not to. E.g. a PEP describing ST would be filled with examples of ST markup -- if the PEP is written in plaintext, these examples don't need any special quoting, but if it is written in ST, they must be quoted. (And don't tell me to put all examples in literal blocks -- inline examples are essential.) The PEP-to-HTML processor uses only one strict rule, and a few heuristics: it uses unindented text for headings (but it doesn't have multiple heading levels), and it turns things looking like URLs into hyperlinks. But other than that it doesn't use any markup characters, and even line breaks in the original text are honored exactly in the HTML. > For what it's worth, there *are* proposals to fix that (the section = > indentation thingy), but they're not worth pursuing until we have > something available to talk about, which is what we've been trying to > do. Sorry, you've lost me here. > > > Any info you can give on what you would like to see come out > > > of this project (or pointers to info) would be most appreciated. > > > > A lot of effort has gone into creating a body of documentation for the > > standard library *outside* the source code. It is rich in mark-up, > > indexed, contains many examples, well maintained, and is generally > > considered high quality documentation. I don't want to have to redo > > the work that went into creating this. > > Of course not. We're not attempting to change that (at least Edward and > I are not). OK. The factions in the doc-sig are hard to keep apart for an outsider. At the conference I met some people who wanted to prescribe that all module documentation be maintained in the source code file. I find that insanity. > > It should be easier to combine code and documentation for 3rd party > > modules, and there should be an easier way to integrate such > > documentation into (a version of) the standard documentation. But I > > disagree with the viewpoint that documentation should be maintained in > > the same file as source code. (If you believe the argument that it > > is easier to ensure that it stays up to date, think again. This never > > worked for comments.) > > I'm afraid you (Guido) are conflating two different arguments. The > argument by Ka-Ping Yee that the *whole* documentation for a module > should live in the file for that module is (a) a different thread, and > (b) one I argue strongly against. I'm so glad to hear that. See above. :-) > I hope that Guido isn't already decided on this issue, once and for all. No, I'm always open to reasonable input. I'm waiting for your spec so I can form an opinion on it. > The consensus of the Doc-SIG, over the years (many of whom have been > people who Guido knows and has much more reason to respect the opinion > of than me) has been that we need a way of formatting docstrings, and > that it should be an etext derivative. (But I've never agreed so far, and my one prolonged experience with an ST-based system makes me hate every bit of it. It could be that implementation though.) > I believe that we (on the > Doc-SIG) have been producing a *better* etext derivative, whilst still > trying to stay at least partially compatible with the sibling STNG > project. I don't think that compatibility with a possibly broken alternative ought to be constraining you. > *Should* we have been more radical and just broken with STNG entirely? > It would have made life somewhat simpler... I don't know much about STNG either -- I always thought that it was an idea for a project to fix the problems with ST, but not anything concrete. I am not aware of any part of Zope actually using STNG, so I doubt that interoperability can be a big issue. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Mar 26 15:36:42 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 26 Mar 2001 10:36:42 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Your message of "Mon, 26 Mar 2001 09:39:06 EST." <20010326093905.D10678@mems-exchange.org> References: <200103242007.f2OK7cp20052@gradient.cis.upenn.edu> <200103242037.PAA28743@cj20424-a.reston1.va.home.com> <20010326093905.D10678@mems-exchange.org> Message-ID: <200103261536.KAA06434@cj20424-a.reston1.va.home.com> > From direct personal experience (15 or so Perl modules, some on CPAN and > some not, comprising ~10k LoC), I *do* think that intermingling code and > documentation makes it easier to update them together. > > Note that it does not make it *absolutely* easy or painless or automatic; > any of those are bogus arguments. But having code and doccs in the same > file definitely makes life *less* painful. What argues against this for me is the existence of highly tuned language-specific editing modes in Emacs and many other text editors; these rarely do a good job on hybrids. --Guido van Rossum (home page: http://www.python.org/~guido/) From gward@mems-exchange.org Mon Mar 26 15:42:37 2001 From: gward@mems-exchange.org (Greg Ward) Date: Mon, 26 Mar 2001 10:42:37 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: <200103261536.KAA06434@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Mar 26, 2001 at 10:36:42AM -0500 References: <200103242007.f2OK7cp20052@gradient.cis.upenn.edu> <200103242037.PAA28743@cj20424-a.reston1.va.home.com> <20010326093905.D10678@mems-exchange.org> <200103261536.KAA06434@cj20424-a.reston1.va.home.com> Message-ID: <20010326104236.D10854@mems-exchange.org> On 26 March 2001, Guido van Rossum said: > > From direct personal experience (15 or so Perl modules, some on CPAN and > > some not, comprising ~10k LoC), I *do* think that intermingling code and > > documentation makes it easier to update them together. > > > > Note that it does not make it *absolutely* easy or painless or automatic; > > any of those are bogus arguments. But having code and doccs in the same > > file definitely makes life *less* painful. > > What argues against this for me is the existence of highly tuned > language-specific editing modes in Emacs and many other text editors; > these rarely do a good job on hybrids. Absolutely true. But python-mode already does a poor job of handling docstrings -- or in fact any interesting[1] triple-quoted string. Personally, I wouldn't mind my docstrings being entirely one colour (say, the "string" colour) and my Python code being colourized properly -- but even without any formal markup in docstrings, Emacs can't handle that, so how can adding a markup syntax make things worse? Greg [1] Eg. def foo (bar, baz): """Returns 'bar' if 'baz' is "foo", or 'baz' if 'bar' is "foo".""" This is legitimate plain-text, and probably pretty close to legit ST, but stuff like this throws python-mode for a loop. I do this kind of markup all the time myself. (Although not that kind of semantics, thankfully. ;-) From guido@digicool.com Mon Mar 26 16:08:01 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 26 Mar 2001 11:08:01 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Your message of "Mon, 26 Mar 2001 10:42:37 EST." <20010326104236.D10854@mems-exchange.org> References: <200103242007.f2OK7cp20052@gradient.cis.upenn.edu> <200103242037.PAA28743@cj20424-a.reston1.va.home.com> <20010326093905.D10678@mems-exchange.org> <200103261536.KAA06434@cj20424-a.reston1.va.home.com> <20010326104236.D10854@mems-exchange.org> Message-ID: <200103261608.LAA06633@cj20424-a.reston1.va.home.com> > > > Note that it does not make it *absolutely* easy or painless or automatic; > > > any of those are bogus arguments. But having code and doccs in the same > > > file definitely makes life *less* painful. > > > > What argues against this for me is the existence of highly tuned > > language-specific editing modes in Emacs and many other text editors; > > these rarely do a good job on hybrids. > > Absolutely true. But python-mode already does a poor job of handling > docstrings -- or in fact any interesting[1] triple-quoted string. Personally, > I wouldn't mind my docstrings being entirely one colour (say, the "string" > colour) and my Python code being colourized properly -- but even without any > formal markup in docstrings, Emacs can't handle that, so how can adding a > markup syntax make things worse? (IDLE does docstrings right, by the way.) Emacs also has sophisticated Latex and XML modes, which are however useless for Latex or XML embedded in docstrings. That's why I prefer to have the docs in a separate file -- so I can have a separate mode to help me edit it. --Guido van Rossum (home page: http://www.python.org/~guido/) From edloper@gradient.cis.upenn.edu Mon Mar 26 16:18:53 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 26 Mar 2001 11:18:53 EST Subject: [Doc-SIG] Where to go from here Message-ID: <200103261618.f2QGIrp12424@gradient.cis.upenn.edu> Guido's input has raised some questions about whether we're going in the right direction... But he's made it clear that he'll keep an open mind, and seriously consider any real specs we come up with. So I propose we do the following: - continue trying to come up with a concrete, formal spec - drop the idea of maintaining compatibility with STNG, for now. Once STNG sees how much cooler our markup language is, we can convert them. ;) Maybe we should even come up with a new name, so that other people who have become embittered with STclassic won't take it out on us. :) - STminus will focus purely on coming up with a formal description, and drop its goals of unifying STNG/STpy. - Focus on the goal of making a *real* markup language that is *lightweight* and simple to read/write. - Once we have a real specification (hopefully in a couple weeks), we can talk to Guido/others about whether it's acceptable. It's unreasonable to expect Guido to make judgements when the ST stuff is in the state of flux it's in now. Dropping STNG compatibility will allow us to consider a number of options that I hadn't brought up before.. For example, I think we might want to replace '--' with '---' as the description list indicator, since people *do* use '--' in text (I know I do, and apparently Guido does too). And I think we should drop 'o' as a bullet character. etc.. As for colorization, java-mode does just fine colorizing javadoc comments, so I don't see how it's a problem *in principle*, just a problem of someone figuring out what to tell emacs (I'm sure emacs could be told to colorize tripple-quoted-strings correctly if someone really wanted to figure out how to.. I've just been using the work-around of backslashing all double quotes in tripple-quoted-strings, which doesn't affect their value, and makes them colorize correctly) -Edward From pf@artcom-gmbh.de Mon Mar 26 17:22:45 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Mon, 26 Mar 2001 19:22:45 +0200 (MEST) Subject: [Doc-SIG] Where to go from here In-Reply-To: <200103261618.f2QGIrp12424@gradient.cis.upenn.edu> from "Edward D. Loper" at "Mar 26, 2001 11:18:53 am" Message-ID: Hi Edward, > - continue trying to come up with a concrete, formal spec That would be very nice. > - drop the idea of maintaining compatibility with STNG, for Yes. *Some* ideas from ST are good. Let's drop all the others. Especially heading recognition in ST sucks. > Dropping STNG compatibility will allow us to consider a number > of options that I hadn't brought up before.. For example, I think > we might want to replace '--' with '---' as the description list > indicator, since people *do* use '--' in text (I know I do, and > apparently Guido does too). And I think we should drop 'o' as > a bullet character. etc.. I think, a description list can be dropped alltogether. At least for the time being a bullet list will be enough. enumerated lists: ...hmmm... I think we can also live without them for a try. I think we should aim for *very* minimalistic set of features and people may than add other things lateron: * emphasizing of *single* words. * section headings (marked up through underlining with a line of hyphens or '=' and preceeded by a blank line). * bullet item lists (which may be nested through indentation). * References to URLs, to Mailaddresses and to Python objects. * pre formatted paragraphs for code examples, tables and such: (every paragraph with mixed indentation or which starts with the patterns '>>>' or '+--' should be left allone. Only properly aligned normal text paragraphs should allowed for reformatting. Than let's try to implement this minimal set and plug this into Ping's pydoc and see what comes out, if running this on existing sources. Of course this will never be able to replace an external documentation written with a powerful markup system like LaTeX. But it would make Pings marvelous pydoc an even more worthwile tool for all this useful version 0.x.y stuff written in Python, which comes without documentation for the prime time. Just my 2 pfennig, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From guido@digicool.com Mon Mar 26 06:34:09 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 26 Mar 2001 01:34:09 -0500 Subject: [Doc-SIG] Where to go from here In-Reply-To: Your message of "Mon, 26 Mar 2001 19:22:45 +0200." References: Message-ID: <200103260634.BAA00807@cj20424-a.reston1.va.home.com> > > Dropping STNG compatibility will allow us to consider a number > > of options that I hadn't brought up before.. For example, I think > > we might want to replace '--' with '---' as the description list > > indicator, since people *do* use '--' in text (I know I do, and > > apparently Guido does too). And I think we should drop 'o' as > > a bullet character. etc.. > > I think, a description list can be dropped alltogether. Yes! They are darn ugly in HTML anyway. > At least for the time being a bullet list will be enough. Agreed. > enumerated lists: ...hmmm... I think we can also live without > them for a try. In any case, ST++ shouldn't go and rewrite the item numbers. The requirement that the input is also readable without processing means that the author ought to put the proper numbers in there by hand anyway, so all ST++ needs to do is recognize them and give the paragraph the proper indent/spacing. > I think we should aim for *very* minimalistic set of features > and people may than add other things lateron: > * emphasizing of *single* words. > * section headings (marked up through underlining with a line of > hyphens or '=' and preceeded by a blank line). > * bullet item lists (which may be nested through indentation). > * References to URLs, to Mailaddresses and to Python objects. > * pre formatted paragraphs for code examples, tables and such: > (every paragraph with mixed indentation or which starts with > the patterns '>>>' or '+--' should be left allone. Only properly > aligned normal text paragraphs should allowed for reformatting. Please do look at the conventions in MoinMoin an another example! > Than let's try to implement this minimal set and plug this into Ping's > pydoc and see what comes out, if running this on existing sources. > Of course this will never be able to replace an external documentation > written with a powerful markup system like LaTeX. But it would > make Pings marvelous pydoc an even more worthwile tool for all this > useful version 0.x.y stuff written in Python, which comes without > documentation for the prime time. > > Just my 2 pfennig, Peter Surely you mean 0.02 Euro. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From ping@lfw.org Mon Mar 26 12:39:23 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 26 Mar 2001 04:39:23 -0800 (PST) Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: <20010326104236.D10854@mems-exchange.org> Message-ID: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. --0-752675476-985610362=:570 Content-Type: TEXT/PLAIN; charset=US-ASCII On Mon, 26 Mar 2001, Greg Ward wrote: > I wouldn't mind my docstrings being entirely one colour (say, the "string" > colour) and my Python code being colourized properly -- but even without any > formal markup in docstrings, Emacs can't handle that, so how can adding a > markup syntax make things worse? [...] > def foo (bar, baz): > """Returns 'bar' if 'baz' is "foo", or 'baz' if 'bar' is "foo".""" > > This is legitimate plain-text, and probably pretty close to legit ST, but > stuff like this throws python-mode for a loop. This is pretty surprising. I use Vim and it has never had any trouble with colourizing this kind of stuff. It was even very easy to classify docstrings (based on their position in the code) separately from ordinary literal strings. Python's syntax makes this easy; you just look for a colon at the end of the previous line. Surely it must be possible for Emacs to do the same, since elisp is so much more powerful than the pattern language Vim uses for configuring colourization modes -- it's annoying to have Python code littered with all of these font-lock hints. For reference i've attached my Python syntax-highlighting file for Vim. It has served me quite well over the years. -- ?!ng --0-752675476-985610362=:570 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="python.vim" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename="python.vim" IiBWaW0gc3ludGF4IGZpbGUgZm9yIFB5dGhvbg0KIiBLYS1QaW5nIFllZSwg MTMgSmFudWFyeSAxOTk5DQoNCnN5bnRheCBjbGVhcg0KDQpzeW50YXgga2V5 d29yZCBweXRob25TdGF0ZW1lbnQgICAgICAgICAgYnJlYWsgY29udGludWUg ZGVsDQpzeW50YXgga2V5d29yZCBweXRob25TdGF0ZW1lbnQgICAgICAgICAg ZXhjZXB0IGV4ZWMgZmluYWxseQ0Kc3ludGF4IGtleXdvcmQgcHl0aG9uU3Rh dGVtZW50ICAgICAgICAgIHBhc3MgcHJpbnQgcmFpc2UNCnN5bnRheCBrZXl3 b3JkIHB5dGhvblN0YXRlbWVudCAgICAgICAgICByZXR1cm4gdHJ5DQpzeW50 YXgga2V5d29yZCBweXRob25SZXBlYXQgICAgICAgICAgICAgZm9yIHdoaWxl DQpzeW50YXgga2V5d29yZCBweXRob25Db25kaXRpb25hbCAgICAgICAgaWYg ZWxpZiBlbHNlIHRoZW4NCnN5bnRheCBrZXl3b3JkIHB5dGhvbk9wZXJhdG9y ICAgICAgICAgICBhbmQgaW4gaXMgbm90IG9yDQpzeW50YXgga2V5d29yZCBw eXRob25Ub2RvICAgICAgICAgICAgICAgVE9ETyBGSVhNRSBYWFggY29udGFp bmVkDQoNCnN5bnRheCBtYXRjaCAgIHB5dGhvbkRvY1N0YXJ0ICAgICAgICAg ICAiOiAqJCIgbmV4dGdyb3VwPXB5dGhvbkRvY1N0cmluZyBza2lwbmwgc2tp cHdoaXRlDQpzeW50YXggcmVnaW9uICBweXRob25Eb2NTdHJpbmcgICAgICBz dGFydD0vclw9Jy8gIGVuZD0vJy8gc2tpcD0vXFwnXHxcXFxcLw0Kc3ludGF4 IHJlZ2lvbiAgcHl0aG9uRG9jU3RyaW5nICAgICAgc3RhcnQ9L3JcPSIvICBl bmQ9LyIvIHNraXA9L1xcIlx8XFxcXC8NCnN5bnRheCByZWdpb24gIHB5dGhv bkRvY1N0cmluZyAgICAgIHN0YXJ0PS9yXD0iIiIvICBlbmQ9LyIiIi8gc2tp cD0vXFwiXHxcXFxcLw0Kc3ludGF4IHJlZ2lvbiAgcHl0aG9uRG9jU3RyaW5n ICAgICAgc3RhcnQ9L3JcPScnJy8gIGVuZD0vJycnLyBza2lwPS9cXCdcfFxc XFwvDQoNCnN5bnRheCByZWdpb24gIHB5dGhvblN0cmluZyAgICAgICAgIHN0 YXJ0PS9yXD0nLyAgZW5kPS8nLyBza2lwPS9cXCdcfFxcXFwvDQpzeW50YXgg cmVnaW9uICBweXRob25TdHJpbmcgICAgICAgICBzdGFydD0vclw9Ii8gIGVu ZD0vIi8gc2tpcD0vXFwiXHxcXFxcLw0Kc3ludGF4IHJlZ2lvbiAgcHl0aG9u U3RyaW5nICAgICAgICAgc3RhcnQ9L3JcPSIiIi8gIGVuZD0vIiIiLyBza2lw PS9cXCJcfFxcXFwvDQpzeW50YXggcmVnaW9uICBweXRob25TdHJpbmcgICAg ICAgICBzdGFydD0vclw9JycnLyAgZW5kPS8nJycvIHNraXA9L1xcJ1x8XFxc XC8NCg0Kc3ludGF4IG1hdGNoIHB5dGhvbkNhbGwgICAgICAgICAgICAgICAg ICJbQS1aYS16MC05X11cKygibWU9ZS0xDQoNCnN5bnRheCBrZXl3b3JkIHB5 dGhvblR5cGUgICAgICAgICAgICAgICBsYW1iZGENCnN5bnRheCBrZXl3b3Jk IHB5dGhvbkRlZmluaXRpb24gICAgICAgICBkZWYgY2xhc3MgY29udGFpbmVk DQpzeW50YXgga2V5d29yZCBweXRob25QcmVQcm9jICAgICAgICAgICAgaW1w b3J0IGZyb20NCnN5bnRheCBtYXRjaCAgIHB5dGhvbkNsYXNzRGVmICAgICAg ICAgICAiXDxjbGFzc1xzXCtbQS1aYS16MC05X11cKyIgY29udGFpbnM9cHl0 aG9uRGVmaW5pdGlvbixweXRob25DbGFzcw0Kc3ludGF4IG1hdGNoICAgcHl0 aG9uQ2xhc3MgICAgICAgICAgICAgICJbQS1aYS16MC05X11cKyIgY29udGFp bmVkDQpzeW50YXggbWF0Y2ggICBweXRob25GdW5jdGlvbkRlZiAgICAgICAg Ilw8ZGVmXHNcK1tBLVphLXowLTlfXVwrIiBjb250YWlucz1weXRob25EZWZp bml0aW9uLHB5dGhvbkZ1bmN0aW9uDQpzeW50YXggbWF0Y2ggICBweXRob25G dW5jdGlvbiAgICAgICAgICAgIltBLVphLXowLTlfXVwrIiBjb250YWluZWQN CnN5bnRheCBtYXRjaCAgIHB5dGhvbkNvbW1lbnQgICAgICAgICAgICAiIy4q JCIgY29udGFpbnM9cHl0aG9uVG9kbw0Kc3ludGF4IG1hdGNoICAgcHl0aG9u TnVtYmVyICAgICAgICAgICAgICItXD1cPFxkXCtcKFwuXGQqXClcPVwoW2VF XVxkXCtcKVw9Ig0Kc3ludGF4IG1hdGNoICAgcHl0aG9uTm9uZSAgICAgICAg ICAgICAgICJcPE5vbmVcPiINCnN5bnRheCBtYXRjaCAgIHB5dGhvblNlbGYg ICAgICAgICAgICAgICAiXDxzZWxmXD4iDQoNCnN5bnRheCBzeW5jIG1hdGNo IHB5dGhvblN5bmMgZ3JvdXBoZXJlIE5PTkUgIik6JCINCnN5bnRheCBzeW5j IG1heGxpbmVzPTEwMA0KDQppZiAhZXhpc3RzKCJkaWRfcHl0aG9uX2hpZ2hs aWdodCIpDQogIGxldCBkaWRfcHl0aG9uX2hpZ2hsaWdodCA9IDENCiAgaGln aGxpZ2h0IGxpbmsgcHl0aG9uU3RhdGVtZW50ICAgICAgICAgICAgICAgIFN0 YXRlbWVudA0KICBoaWdobGlnaHQgbGluayBweXRob25UeXBlICAgICAgICAg ICAgICAgICAgICAgVHlwZQ0KICBoaWdobGlnaHQgbGluayBweXRob25EZWZp bml0aW9uICAgICAgICAgICAgICAgVHlwZQ0KICBoaWdobGlnaHQgbGluayBw eXRob25Db25kaXRpb25hbCAgICAgICAgICAgICAgQ29uZGl0aW9uYWwNCiAg aGlnaGxpZ2h0IGxpbmsgcHl0aG9uUmVwZWF0ICAgICAgICAgICAgICAgICAg IFJlcGVhdA0KICBoaWdobGlnaHQgbGluayBweXRob25TdHJpbmcgICAgICAg ICAgICAgICAgICAgU3RyaW5nDQogIGhpZ2hsaWdodCBsaW5rIHB5dGhvbk9w ZXJhdG9yICAgICAgICAgICAgICAgICBPcGVyYXRvcg0KICBoaWdobGlnaHQg bGluayBweXRob25DbGFzcyAgICAgICAgICAgICAgICAgICAgQ2xhc3MNCiAg aGlnaGxpZ2h0IGxpbmsgcHl0aG9uRnVuY3Rpb24gICAgICAgICAgICAgICAg IEZ1bmN0aW9uDQogIGhpZ2hsaWdodCBsaW5rIHB5dGhvblByZVByb2MgICAg ICAgICAgICAgICAgICBQcmVQcm9jDQogIGhpZ2hsaWdodCBsaW5rIHB5dGhv bkNvbW1lbnQgICAgICAgICAgICAgICAgICBDb21tZW50DQogIGhpZ2hsaWdo dCBsaW5rIHB5dGhvbkRvY1N0cmluZyAgICAgICAgICAgICAgICBDb21tZW50 DQogIGhpZ2hsaWdodCBsaW5rIHB5dGhvblRvZG8gICAgICAgICAgICAgICAg ICAgICBUb2RvDQogIGhpZ2hsaWdodCBsaW5rIHB5dGhvbk51bWJlciAgICAg ICAgICAgICAgICAgICBOdW1iZXINCiAgaGlnaGxpZ2h0IGxpbmsgcHl0aG9u Tm9uZSAgICAgICAgICAgICAgICAgICAgIENvbnN0YW50DQogIGhpZ2hsaWdo dCBsaW5rIHB5dGhvblNlbGYgICAgICAgICAgICAgICAgICAgICBDb25zdGFu dA0KICBoaWdobGlnaHQgbGluayBweXRob25DYWxsICAgICAgICAgICAgICAg ICAgICAgQ2FsbA0KZW5kaWYNCg0KbGV0IGI6Y3VycmVudF9zeW50YXggPSAi cHl0aG9uIg0K --0-752675476-985610362=:570-- From gward@mems-exchange.org Mon Mar 26 21:10:49 2001 From: gward@mems-exchange.org (Greg Ward) Date: Mon, 26 Mar 2001 16:10:49 -0500 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: ; from ping@lfw.org on Mon, Mar 26, 2001 at 04:39:23AM -0800 References: <20010326104236.D10854@mems-exchange.org> Message-ID: <20010326161048.C11145@mems-exchange.org> On 26 March 2001, Ka-Ping Yee said: > This is pretty surprising. I use Vim and it has never had any trouble > with colourizing this kind of stuff. It was even very easy to classify > docstrings (based on their position in the code) separately from ordinary > literal strings. Python's syntax makes this easy; you just look for a > colon at the end of the previous line. Surely it must be possible for > Emacs to do the same, since elisp is so much more powerful than the > pattern language Vim uses for configuring colourization modes -- it's > annoying to have Python code littered with all of these font-lock hints. [...off-topic and getting worse...] You know, the more I think about it, the more I think Emacs is the Perl or TeX of editors: hairy, overgrown, and too big and complicated for any ordinary mortal to grasp. Probably the fact that Elisp is so much more powerful is part of the reason that most Emacs modes just can't seem to get it right -- probably Elisp is *too* powerful (or *too* complicated, take your pick). IOW, Elisp is the problem, not the solution. Amusing anecdote [even more off-topic]: one of the stated reasons that the package delimiter changed from ' to :: in Perl 5 was so that Emacs wouldn't get confused. (Although why $foo'bar was ever considered a good way to denote variable 'bar' in package 'foo' is beyond me...) Back to your ordinary doc-sig... and turn off your flamethrowers, I'll probably keep using Emacs until they pry the keyboard from my cold, dead fingers... (or until somebody invents Pymacs ;-) Greg From Juergen Hermann" Message-ID: On Mon, 26 Mar 2001 15:53:17 +0100, Tony J Ibbs (Tibs) wrote: >Erm, "Arguments:" and "Returns:" (the last is an "I think", 'cos I don't >tend to use it) Hmmm, I think you mean "Argumente:" and "R=FCckgabewert:". :> Ciao, J=FCrgen From tony@lsl.co.uk Tue Mar 27 08:41:37 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 27 Mar 2001 09:41:37 +0100 Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Message-ID: <001001c0b699$bc997900$f05aa8c0@lslp7o.int.lsl.co.uk> Juergen Hermann wrote: > On Mon, 26 Mar 2001 15:53:17 +0100, Tony J Ibbs (Tibs) wrote: > >Erm, "Arguments:" and "Returns:" (the last is an "I think", 'cos I > > don't tend to use it) > > Hmmm, I think you mean "Argumente:" and "R�ckgabewert:". :> Strangely enough, the reason docutils (my STpy implementation) uses a dictionary to "translate" these terms is so that non-English writers can have something sensible as an alternative (although probably not in the alpha). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Tue Mar 27 09:58:50 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 27 Mar 2001 10:58:50 +0100 Subject: [Doc-SIG] Where to go from here In-Reply-To: <200103261618.f2QGIrp12424@gradient.cis.upenn.edu> Message-ID: <001101c0b6a4$860c51e0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward Loper wrote: > Guido's input has raised some questions about whether we're going > in the right direction... To say the least. One can't help feeling that he might have objected sooner if he were going to object so much (like, for instance, last time round the loop when ST was decided on). Grump, moan, whine. > But he's made it clear that he'll keep an open mind, > and seriously consider any real specs we come up with. So long as we gird our loins and don't just give up - I was feeling pretty dispirited about all of this yesterday (I'm sorry, Guido, but to be told "this must be bad because it shares part of the name of something else" is not polite - it's exactly like saying one had a bad experience with regexp so one doesn't like sre). Anyway, putting my rational head back on: > So I propose we do the following: > > - continue trying to come up with a concrete, formal spec Agreed. I *still* don't think we're far off one, Guido's pessimism despite (and he still hasn't had a chace to *look* at what we've been doing). > - drop the idea of maintaining compatibility with STNG, for > now. We'd all like that. It disturbs me a little that we can consider it so easily, though, given how powerful the arguments for keeping STClassic and STNG compatibility were in the past. > Once STNG sees how much cooler our markup language > is, we can convert them. ;) Maybe we should even come up > with a new name, so that other people who have become > embittered with STclassic won't take it out on us. :) and so Guido won't be prejudiced because of the name (sorry, I'll try to stop grumping). Of course, the obvious name would be "pydoc", but that's rather taken... Perhaps we should choose "pytext", by analogy with the grandfather format, setext. > - STminus will focus purely on coming up with a formal > description, and drop its goals of unifying STNG/STpy. Makes sense. So URIs are now delimited by '<..>', yes? > - Focus on the goal of making a *real* markup language that is > *lightweight* and simple to read/write. But you are not going to get a real markup language (for any sense of "real" that I understand) if your start and end delimiters are the same - and I don't see how we can compromise on that. We still have the problem that if we don't have *really* lightweight markup, people won't do it, and that something akin to what people do in email (i.e., something like ST/STpy, I'm afraid) is the best bet for that - unless you're proposing to start the discussion that started back in 1997 all over again. > - Once we have a real specification (hopefully in a couple > weeks), we can talk to Guido/others about whether it's > acceptable. It's unreasonable to expect Guido to make > judgements when the ST stuff is in the state of flux it's > in now. I agree we need to have a specification - that's what we've been working towards. But I think the correct is still a PEP, and Guido is only one of the people who vote on PEPs. He's certainly the *most important* person, but I can't (I refuse to) believe that he doesn't change his mind on occasion. I am *very* scared that the last time round the Doc-SIG loop got this close to having something that worked, and got kiboshed by Spam8. Let's (please) not let that happen again. > Dropping STNG compatibility will allow us to consider a number > of options that I hadn't brought up before.. For example, I think > we might want to replace '--' with '---' as the description list > indicator, since people *do* use '--' in text (I know I do, and > apparently Guido does too). And I think we should drop 'o' as > a bullet character. etc.. I agree with dropping 'o' - can we add '+' as an alternative? Personally I dislike ' --- ' as the descriptive list delimiter, but not enough to jump up an down too much - and since you're also wanting to use '--' as a hyphen (presumably actually an m-dash), I'll go for it. Descriptive lists *are* meant to stand out, after all. One problem is that Guido's style guide suggests using ' -- ' for descriptive lists:: Keyword arguments: real -- the real part (default 0.0) imag -- the imaginary part (default 0.0) so we'd need to get him to change that (presumably not a problem if the PEP were accepted). > As for colorization, java-mode does just fine colorizing javadoc > comments, so I don't see how it's a problem *in principle*, just > a problem of someone figuring out what to tell emacs (I'm sure > emacs could be told to colorize tripple-quoted-strings correctly > if someone really wanted to figure out how to.. See http://www.python.org/emacs/python-mode/faq.html for an explanation of the problems. > I've just been using the work-around of backslashing > all double quotes in tripple-quoted-strings, which > doesn't affect their value, and makes them colorize > correctly) Hmm. Ugly, but worth mentioning as a tip (although single quotes can cause problems too). Peter Funk suggested: > Especially heading recognition in ST sucks. I dislike intensely headers in STClassic and STNG. We *might* be able to ignore the problem if we are simply addressing docstrings. Alternatively, there are two other ways to do it (one lightweight and hacky, the other heavyweight and, well, different): 1. Assume that a heading will be underlined. Text after a heading need not be indented any more than it normally would. I think this was suggested by David Goodger. For instance:: This is heading 1 ================= This text is within "heading 1"'s section. This is subheading 2! --------------------- Which introduces a subsection. This is subsubheading 3... ~~~~~~~~~~~~~~~~~~~~~~~~~~ And that is surely enough depth to satisfy anyone using docstrings... 2. Provide "proper" sectioning commands - for instance:: Section 1: Its title And some text Subsection: It can decide its number One would also provide other appropriate "names" for sections. Option 1 seems to me more appropriate for docstrings, option 2 for longer texts - so since we're working on docstrings, I'd go for option 1. Details to be worked out are whether one needs to get the number of underline characters right or not (!). Peter Funk also wrote: > I think, a description list can be dropped alltogether. > At least for the time being a bullet list will be enough. > enumerated lists: ...hmmm... I think we can also live without > them for a try. No and no. I vehemently disagree. And Guido's suggestion that: > Yes! They are darn ugly in HTML anyway. is just plain silly - for a start, that's almost entirely down to using default settings with poor browers (OK, IE and Netscape!), and secondly it's *definitely* controllable by writing extra HTML code, or (horrors) using style sheets. Let's not let the presentation of one format drive our whole effort. I *do* sort-of agree with Guido's point that it is a bad thing to lose the "number" from an enumeration, though - the reason for my making this optional in STpy was purely I think it may be difficult in HTML (and that's somewhat more than a presentation issue). But I've always worried about it, because of one's wish to refer back to list items by sequence number in the surrounding text. I think this one needs thinking about. Guido said: > Please do look at the conventions in MoinMoin an another example! Hmm. Last time I looked at MoinMoin I got no further than the "traditional" use of multiple quotes to mean different things, and gave up (despite the fact I rather like how it looks through a browser). Hmm. It's a mishmash of odds and ends, not designed to be read *as text*. I'm a bit disturbed that Guido refers us to this as something worth following up, since it seems to miss the point of what we're trying to do (which is *not*, in the first instance, at least, to support a Wiki). (From a very quick scan: possibly good ideas: they allow internal indentation in a paragraph to have meaning. Not sure what *use* it is, but it's fun. good ideas for a Wiki: they allow one to "optimise" URIs that end in .jpg or .gif so that the image is included instead. Not so useful in docstrings) Peter Funk continued: > I think we should aim for *very* minimalistic set of features > and people may than add other things lateron: > * emphasizing of *single* words. Edward had suggested that. I emphasise more than one word too often in my writing to be happy with that, and I don't see it as being a problem (if you're working from ideas of how STClassic does it, please don't!). > * section headings (marked up through underlining with a line of > hyphens or '=' and preceeded by a blank line). Ah - I hadn't read that far - we agree, more or less. > * bullet item lists (which may be nested through indentation). Module I want all three list types. > * References to URLs, to Mailaddresses and to Python objects. "Mailaddresses"? I assume you mean "mailto:", which is just one form of URI. We had been leaving references to Python objects to later as slightly harder and a "pydoc" type issue (one of the reasons for marking up Python words inline is to make this easier to get right). > * pre formatted paragraphs for code examples, tables and such: > (every paragraph with mixed indentation or which starts with > the patterns '>>>' or '+--' should be left allone. Only properly > aligned normal text paragraphs should allowed for reformatting. This is too complex. The concepts we already had in STpy were enough - i.e., using the '::' idea to introduce literal blocks and '>>>' to introduce doctest blocks. Trying to "guess" based on mixed case is too error prone (gods, is that error prone!), and I for one refuse to countenace needing to put some strange characters in front of literal text just to make it literal - yuck. > Than let's try to implement this minimal set and plug this > into Ping's pydoc and see what comes out, if running this > on existing sources. Well, we just about *have* implementation of our format (if people didn't keep arguing/discussing it would probably have been out already!). And there *is* precedence for changing the existing source documentations, if we need to (although my own experience with a few modules so far is that this is not a problem). *I* would prefer to keep a more complex thing that includes '[localrefs]' ('cos they're useful - and if we can't handle a PEP we're not doing too well). The "labelled paragraphs" thing is too useful for me to want to give up immediately, as well. I'm undecided about whether we need two forms of emphasis - Edward, what is your feeling, should we drop '*..*'? I get left with: * Structuring is still available by indentation, as before, although its use is less important (and to those who've only just joined us, don't worry, it does come out in the wash, honest) * Blank lines delimit paragraphs, as before * List items start paragraphs, as before * Headings are done by underlining (three levels of heading) * doctest blocks are introduced by >>> * literal blocks are introduced by :: on the previous paragraph, as before * Emphasis is by *..* (dropping **..**, which makes life simpler). Obviously, one can't nest *..* inside *..* * Literals are '..' and #..# (as before) * URIs are delimited by <..> * URI "text" is done as "some text": - note that under the new scheme we can safely allow optional whitespace after the colon, which allows friendlier typing * Standalone URIs (e.g., ) get rendered as they look * Local references are like [this], and refer to '..[this]' anchors as before (which need to be at the start of a paragraph - an anchor at the start of a line should also start a new paragraph) * Labelled paragraphs are available - for instance:: Author: Guido Arguments: #fred# --- the fred'th dimension as before. I think these are too useful to drop. Edward - have I forgotten anything? What do *you* think? You propose continuing with STminus (under a new name - pytext-slim?) with proper formal definition and minimal markup. I see that as a continuation of what was being done, but without the ST constraints. I would propose amending docutils to implement my proposals above (modulo keeping compatibility with your work), and rewrite STpy.html as pytext.html (pytext-fat.html?), describing it. Mainly because it's just about there, and will provide a "test bed" to allow people to think about which features are actually wanted. That gives us two strings to our bow, which I think is still a good idea. I propose that we still maintain output to a DOM tree, as discussed elsewhere. Meanwhile, I think we should have hope that we can convince Guido that he (honestly) was using a bad tool, and that he shouldn't judge "superficially similar" things by said tool. [[[I'm still yea-close to an alpha release of docutils. It's not put much farther off by the new considerations, so I still propose to put together a PEP for an altered version. But it's not going to be this week. I may be able to take some time off next week, which would help.]]] Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Well we're safe now....thank God we're in a bowling alley. - Big Bob (J.T. Walsh) in "Pleasantville" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Wed Mar 28 11:01:47 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Wed, 28 Mar 2001 12:01:47 +0100 Subject: [Doc-SIG] New document - pytext-fat In-Reply-To: <001101c0b6a4$860c51e0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <001101c0b776$7ba21e60$f05aa8c0@lslp7o.int.lsl.co.uk> Hmm - either our email reception has gone wrong, or everyone is quietly working away in the background (or catching up on sleep). I spent last night (sort of literally) and a bit of this morning writing a first draft of the "fat" pytext specification. It can be found at: I apologise for the mistakes that are bound to be therein - I haven't exactly had time to reread it, and it was written in two long sessions, mostly from memory. It gets a bit thin towards the end, in the colourisation section, due to time constraints. I definitely don't guarantee to have got the DOM definitions right - Edward and I were working towards agreement on that, and I just didn't have time to refer to what we'd done so far. I propose to amend docutils to support the format documented in fat.html, with command line options to allow "obvious" experimentation, so that people can play with different approaches, and also enable/disable things (so, for instance, one should be able to disable locarefs, anchors and labels, if one wishes). (The main thing to add is the proposed new header syntax). That seems a useful "niche" for docutils to fill, whilst Edward works on pytext-slim and the corresponding tool. I hope to expand on *reasons* for decisions at some point, and I have a sneaky feeling that an annotated "history" of the Doc-SIG might be a useful resource apres-PEP - I'll volunteer to try to get round to that as well, since I were a participant for much of it (it's basically a cut-viciously-and-paste-and-edit job on the Doc-SIG archive). Sleepily, Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From edloper@gradient.cis.upenn.edu Wed Mar 28 15:30:34 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 28 Mar 2001 10:30:34 EST Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: Your message of "Sat, 24 Mar 2001 15:37:52 EST." <200103242037.PAA28743@cj20424-a.reston1.va.home.com> Message-ID: <200103281530.f2SFUYp29461@gradient.cis.upenn.edu> > > * Intuitiveness: [...] > > * Ease of use: [...] > > Of course. This is all apple pie and motherhood. nobody will want > documentation that's unintuitive or hard to use! Not necessarily -- Many people use Javadoc, and it's not "easy to use"; and I would argue that LaTeX is not "intuitive," as I defined it.. These design goals have specific consequences, like ruling out "heavy-weight" formalisms... > > * Expressive Power: The formatting conventions must have > > enough expressive power to allow users to write the API > > documentation for any python object. > > I've never found that plaintext got in the way of my expressiveness. It depends on whether you want to express things to people or to computer tools. People are very good at reading plaintext docstrings, and getting the appropriate info out of them. But that doesn't mean it's easy to write a tool to do the same.. I believe that formatting conventions should have enough expressive power, for example, to distinguish between regions that can be rendered in non-monospaced font, & word wrapped, and those that should be rendered as "literal." > > * Simplicity: [...] > More motherhood. Again, I disagree. Neither HTML nor LaTeX is *simple*. > > * Safety: No well-formed formatted documentation string should > > result in unexpected formatting. This goal derives from > > intuitiveness. > This is a good one. ST loses big here! Well, at least STclassic and STNG. That's the reason why I would vehemently oppose using them. > I though Javadoc was geared too much towards generating HTML; we > should not focus too much on HTML. It was initially geared towards generating HTML, although they have tools to render it in LaTeX, emacs info files, framemaker, etc. Most of these tools work by requiring that you not use arbitrary HTML tags in your docs, but just limit yourself to a limited set (, , , , etc.. usually 15-50 tags, depending on the tool). Also, there are 2 orthogonal features of Javadoc: 1. their ability to use HTML in comments (which I don't think we should adopt) 2. their ability to mark special values, using forms like: @param(x) description of x... @throw(y) description of when y is thrown.. I think we should have something like (2), although it might be more pythonic to do something like: argument x: description of x throw y: description of y or: arguments: x -- description of x y -- description of y (incidentally, this use is the main reason that I support DLs in documentation strings..) -Edward From edloper@gradient.cis.upenn.edu Wed Mar 28 15:34:54 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 28 Mar 2001 10:34:54 EST Subject: [Doc-SIG] Formalizing ST In-Reply-To: Your message of "Sun, 25 Mar 2001 11:42:47 +0200." Message-ID: <200103281534.f2SFYsp29847@gradient.cis.upenn.edu> > > Hmm.. So I'm starting to think that EBNF really isn't the best > > formalism for capturing global formatting. > > Hmmmm..... I think I have to disagree. What is global formatting? > Did you ever had a look at the Python/Grammar/Grammer file, which > is basically EBNF and uses the special Tokens INDENT and DEDENT? > [more info, pointers] Thanks for the pointers. Something like this might work, although I'm still not sure how it will work for literal blocks. But I'll figure that out. I was trying to express things in streight EBNF, not using magic tokens like INDENT and DEDENT, but maybe those will help. This may end up requiring that paragraphs be indented reasonably.. (currently, you can indent paragraphs like this if you want:) This paragraph is indented 8 spaces because its first line is indented 8 spaces, regardless of the subsequent indentations. Most of us think that this is not a useful feature. Of course, we may be ditching indentation-structure, anyway (at least for anything other than lists, and maybe literal blocks..) :) -Edward From edloper@gradient.cis.upenn.edu Wed Mar 28 15:44:01 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 28 Mar 2001 10:44:01 EST Subject: [Doc-SIG] using the same delimiter on the left and right.. Message-ID: <200103281544.f2SFi2p01299@gradient.cis.upenn.edu> Tibs seems to have this strange notion that "real" markup languages don't use the same characters for left and right delimiters.. :) But almost any markup I can think of does... '$' or '$$' in LaTeX for math mode.. '"' in XML/HTML for attribute values.. etc. I think that what makes delimiters in ST seem like not-real-markup is that they are context- dependant. E.g., "'" in the middle of a word is different from "'" at the beginning of a word. So.. let's change that. Let's make all of our delimiters into real delimiters, that can only be used for delimiting (or maybe also for bullets, in the case of '*'). We could switch our "literal" delimiter to "`". So then we would have the following reserved characters, that may not appear in text without being quoted somehow: '<' left delimiter for URLs '>' right delmiter for URLs '#' delimiter for inlines '`' delimiter for literals '*' delimiter for emph, maybe for strong. '::' marker for literal regions Then the only context-dependant characters that remain would be start-list-item characters.. And if we wanted to, we could use '* ' at the beginning of any list item, since it's reserved anyway... something like: * this is an unordered list item *1. this is an ordered list item Well.. I'm not sure whether we'd want to do that or not.. We may be happy with just using '1.' and assuming that no one will start a line with a number that ends a sentence.. But I think that reserving the delimiter characters might still be a good idea.. Does this sound like a reasonable direction to go? It at least seems to me to be closer to a "real" markup language.. -Edward From edloper@gradient.cis.upenn.edu Wed Mar 28 15:55:35 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 28 Mar 2001 10:55:35 EST Subject: [Doc-SIG] What's important in a docstring markup language? Message-ID: <200103281555.f2SFtZp02258@gradient.cis.upenn.edu> So, I was thinking about how to make ST more formal, more like a real markup language, and possibly more lightweight.. And that led me to think about what I *really* want in a markup language for docstrings.. The following is an ordered list of the features I think it should have. I.e., I think features earlier on the list are more important, and ones later are less important. I would personally be happy to draw a line anywhere after 4, and forget about anything under the line. :) 1. The ability to distinguish text that can be rendered with soft spaces and word wrapped from text that should be rendered in monospace with whitespace preserved (i.e., the ability to distinguish natural language from everything else). This includes both inline literals and literal blocks 2. The ability to label the semantic content of parts of descriptions, eg., as the return value or as a description of an argument. 3. The ability to properly handle doctest blocks (this is a high priority, because these have become standard) 4. Unordered lists 5. Ordered lists 6. Sections 7. Hierarchical sections 8. The ability to mark a word as emphasized 9. URLs 10. The ability to mark regions as emphasized 11. The ability to mark regions as strong 12. Footnotes/endnotes 13. Internal anchors/references to parts of a docstring Does this correspond more-or-less with other people's priorities? What markup do people feel is essential for documenting the APIs of python objects? -Edward From guido@digicool.com Wed Mar 28 16:59:18 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 28 Mar 2001 11:59:18 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Wed, 28 Mar 2001 10:44:01 EST." <200103281544.f2SFi2p01299@gradient.cis.upenn.edu> References: <200103281544.f2SFi2p01299@gradient.cis.upenn.edu> Message-ID: <200103281659.LAA09792@cj20424-a.reston1.va.home.com> > So then we would have the following > reserved characters, that may not appear in text without > being quoted somehow: > '<' left delimiter for URLs > '>' right delmiter for URLs > '#' delimiter for inlines > '`' delimiter for literals > '*' delimiter for emph, maybe for strong. > '::' marker for literal regions Yuck. Most of these (except '::') are quite commonly used for other purposes, and occur frequently in examples. I prefer markup languages with very few special characters, e.g. a GNU doc standard whose name I don't recall, which only uses @; or Perl's POD, which seems to get away with making only a letter followed by '<' special. Latex has at least three special characters ('\', '{', '}'), and in some contexts more, and that's already a pain. XML with '<' and '&' is borderline for me. > Then the only context-dependant characters that remain would > be start-list-item characters.. And if we wanted to, we could > use '* ' at the beginning of any list item, since it's > reserved anyway... something like: > > * this is an unordered list item > *1. this is an ordered list item This is OK, although I like the single hyphen form better. > Well.. I'm not sure whether we'd want to do that or not.. We > may be happy with just using '1.' and assuming that no one will > start a line with a number that ends a sentence.. That was ST's the original sin. > But I > think that reserving the delimiter characters might still be > a good idea.. > > Does this sound like a reasonable direction to go? It > at least seems to me to be closer to a "real" markup language.. I can't endorse this yet. --Guido van Rossum (home page: http://www.python.org/~guido/) From Juergen Hermann" Message-ID: On Wed, 28 Mar 2001 10:34:54 EST, Edward D. Loper wrote: > This paragraph is indented 8 spaces because its first > line is indented 8 spaces, regardless of the subsequent > indentations. Most of us think that this is not > a useful feature. > >Of course, we may be ditching indentation-structure, anyway (at >least for anything other than lists, and maybe literal blocks..) I know this comes a little late, but have you guys considered wiki markup and the python code that exists for it? Many things that are/were fishy in ST are clearly defined in common wiki markups. Ciao, J=FCrgen From akuchlin@mems-exchange.org Wed Mar 28 18:54:35 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 28 Mar 2001 13:54:35 -0500 Subject: [Doc-SIG] Graphics in the docs Message-ID: Two questions forwarded from someone who's working on a HOWTO: >How do I have include graphics in the HOWTO ? > ( I'd like to show the result of my hello, world, etc..., >I have manage to include that in the HTML output but not in PDF... ) >Can I boldface certain parts of python code ( to mark >differences...) >in the verbatim environnement... ( I don't know how to do through) Fred, any suggestions? (I certainly have no objection to doing either of those things in the HOWTOs. They'd make conversion to DocBook or other DTD complicated, but I don't think the HOWTOs are prime candidates for conversion to begin with.) --amk From pf@artcom-gmbh.de Wed Mar 28 18:51:45 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 28 Mar 2001 20:51:45 +0200 (MEST) Subject: [Doc-SIG] Formalizing ST In-Reply-To: from Juergen Hermann at "Mar 28, 2001 8:10:59 pm" Message-ID: Hi, Juergen Hermann points to MoinMoin/parser/wiki.py: > I know this comes a little late, but have you guys considered wiki > markup and the python code that exists for it? Many things that > are/were fishy in ST are clearly defined in common wiki markups. Last weekend I installed MoinMoin 0.8 here on a Server in our companies intranet and played around with the markup. Wiki markup contains some clever ideas but IMO this is not really intuitive markup useful for Python inline doc strings. For example, Headlines in MoinMoin wiki markup are entered as: = This is a very important H1 chapter headline = == This is a slightly less important H2 section headline == === This is a least of all important H3 subsection headline === This sucks IMO, since it emphazises the umimportant headings in favour of the important ones, when viewing the text in an editor. I would prefer to use indentation to markup different levels (borrow this idea from ST) and use simple underlining for marking single lines as headings: This is a very important H1 chapter headline -------------------------------------------- This is a slightly less important H2 section headline ----------------------------------------------------- This is a least of all important H3 subsection headline ------------------------------------------------------- IMO the ''' and '' for Mixing ''italics'' and '''bold''' are also unreadable in Text editors. They conflict with Pythons triple quotes used in Docstrings BTW. I like the *emphasize* proposed in this group and by ST better. However the url detection without requiring '<' and '>' delimters around the http:// ... string is a nice feature of MoinMoin markup. Ping has implemented something similar in pydoc already and this works just fine. I have a similar feeling with the email address recognition (MoinMoin uses the regular expression [-\w._+]+\@[\w.-]+ for that). The use of `{{{' and `}}}' is also too heavy for the purpose. pydoc currently assumes that everything but URLs is preformatted literal material and uses a fixed font to display everything. I believe we can loosen this a bit a consider every paragraph with mixed indentation as literal material. So only a paragraph with properly indented lines will be recognized as text paragraph and possibly reformatted. About lists and numbered lists I'm still not sure what I would like. I bullet item list (LaTeX itemize) seems to be enough for most cases. A few days ago Guido gave a similar statement. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From klm@digicool.com Wed Mar 28 20:35:45 2001 From: klm@digicool.com (Ken Manheimer) Date: Wed, 28 Mar 2001 15:35:45 -0500 (EST) Subject: [Doc-SIG] going awry Message-ID: Darn. We've had a number of occasions where the doc-sig has launched into an effort to formulate doc string conventions, and took a turn to invent a new language - which gets lost in the ether. You seemed to be getting some good progress on fixing the flaws in an existing language - structured text - but it sounds eerily like you're heading towards throwing that out the window, and inventing a new language. I think that's a shame. I think a large part of guido's objections to structured text have to do with battling painful implementation bugs, part to do with lack of predictability, and part do with an expectation that rich-text markup style is going to take over the world, even day-to-day communications. I think the implementation-specific problems can be fixed by the efforts we were seeing. I think that's now in danger of being derailed, to be replaced by another (how many years has this happened?) invent our own. (Perhaps i'm overstating it - maybe what's happening now is more about trimming down from a successful example, which should not be near as prone to getting off track.) I think the expectation for use of rich-text markup style is misguided. There may be tons of day-to-day email out there in html format - but i'd lay high odds that, excluding marketing spam, the vast majority uses no markup at all. (When really in on-the-fly communication mode, regular people just type, they don't use menus. They may resort to *punctuation* to express formatting, but they'll rarely resort to codes. IE and netscape may package people's messages in delightful mime plain-text/html packages, but i expect that the vast majority of the time it's unnecessary.) I hope, if you do try to invent a new language, you'll exploit some of the economies and principles that structured text has demonstrated... Ken klm@digicool.com From Lucas.Bruand@ecl2002.ec-lyon.fr Wed Mar 28 21:35:50 2001 From: Lucas.Bruand@ecl2002.ec-lyon.fr (Lucas Bruand) Date: Wed, 28 Mar 2001 23:35:50 +0200 Subject: [Doc-SIG] (no subject) Message-ID: In Documenting Python, it is written page 11 : > \url{url} > A URL (or URN). The URL will be presented as text. In the HTML and PDF formatted versions, the URL will >also be a hyperlink. This can be used when referring to external resourc= es. Note that many characters are special >to L A T E X and this macro does not always do the right thing. In particular, the tilde character (=91=98=92) is mis-handled; >encoding it as a hex-sequence does work, use =91%7e=92 in place of the t= ilde character. I don't understand what I should exactly write instead of tilde. ( becau= se %7e counts as a remark nor does \symbol{"7e}) Thank in advance for helping a beginner in latex, Lucas Bruand From mal@lemburg.com Thu Mar 29 08:21:33 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 29 Mar 2001 10:21:33 +0200 Subject: [Doc-SIG] going awry References: Message-ID: <3AC2F08C.6B1D0E3B@lemburg.com> Ken Manheimer wrote: > ... > I think a large part of guido's objections to structured text have to do > with battling painful implementation bugs, part to do with lack of > predictability, and part do with an expectation that rich-text markup > style is going to take over the world, even day-to-day communications. Please let me get this straight: as far as I understood Guido's post, he only mentioned that rich text markup didn't work in out in his projects -- he never outruled rich text markup for general use, so I suspect all this confusion to be based on a misunderstanding. There are people out there who use richt text markup in doc-strings today, so I don't think that we should stop talking about a standard for a format. Ideally, there should be an interface for extracting information from single doc-strings or maybe even modules which then lets everybody plug in their own favourite doc-string parser. We already have tons of different auto-doc tools out there in the Python universe -- the problem with most of them is that they do not allow for parser plugins. This should change, IMHO. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/ From pf@artcom-gmbh.de Thu Mar 29 08:48:02 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Thu, 29 Mar 2001 10:48:02 +0200 (MEST) Subject: [Doc-SIG] f(...) vs. f (...) inconsistency Message-ID: Hi, this is a very tiny issue, but it bugged over the years: In his Styleguide Guido wrote: """I **hate** whitespace in the following places: [...] Immediately before the open parenthesis that starts the argument list of a function call, as in spam (1). Always write this as spam(1).""" I agree to 100% with this. On the other hand I very often cut'n'paste between the library reference manual pages and an open editor window. Unfortunately in the library reference there are spaces between function names and the opening parenthesis, which I always have to remove manually. How come? Should the library documentation be fixed in this regard? Regards, Peter From tony@lsl.co.uk Thu Mar 29 09:08:48 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 10:08:48 +0100 Subject: [Doc-SIG] What's important in a docstring markup language? In-Reply-To: <200103281555.f2SFtZp02258@gradient.cis.upenn.edu> Message-ID: <001e01c0b82f$dd7af350$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > The following is an ordered list of the features I think it should > have. I.e., I think features earlier on the list are more important, > and ones later are less important. I would personally be happy > to draw a line anywhere after 4, and forget about anything under > the line. :) > > 1. The ability to distinguish text that can be rendered with soft > spaces and word wrapped from text that should be rendered in > monospace with whitespace preserved (i.e., the ability to > distinguish natural language from everything else). > This includes both inline literals and literal blocks > 2. The ability to label the semantic content of parts of > descriptions, eg., as the return value or as a description > of an argument. > 3. The ability to properly handle doctest blocks (this is a high > priority, because these have become standard) > 4. Unordered lists > 5. Ordered lists I think those are in my top six as well. > 6. Sections In docstrings, which are typically short, I think these are at the end of the list > 7. Hierarchical sections Almost no need for these in docstrings, and I think 2 (or at worst 3) levels is more than enough > 8. The ability to mark a word as emphasized I don't want 8 at all, I want 10. > 9. URLs Quite important if this is meant to be "joined up" documentation (to use a horrible buzz phrase our politicians seem addicted to). It is very important to be able to reference documentation elsewhere, and having them "clickable" in derived formats that support that ability is important too. > 10. The ability to mark regions as emphasized See above. I would place this in my top half a dozen. > 11. The ability to mark regions as strong Don't care - in docstrings, I think one form of emphasis is enough (ref fat.html) > 12. Footnotes/endnotes Useful, very useful, but not essential > 13. Internal anchors/references to parts of a docstring A frill, a frippery, something to forget until someone comes up with a use case. > Does this correspond more-or-less with other people's priorities? > What markup do people feel is essential for documenting the APIs of > python objects? So I guess my list is: 1. "plain" text versus "literal" text (as you say, inline and by the block) 2. "Python" inline versus "other literal" inline. 3. Emphasis (so I don't have to SHOUT A LOT) 4. doctest blocks, 'cos they're easy and useful 5. Lists - all three types - I use all three a lot 6. Label blocks (those things like "Arguments" and so on, which contain a particular sort of information) Those six are my core. Their order isn't too important, 'cos I want all of them (I guess 6 is slightly less important) 7. URI detection This lives by itself, and nearly makes the "prime" list. 8. Footnotes/endnotes 9. Headers and sections I'm not sure about the order of those last two. Mind you, I think this is a useful excercise (with luck, it won't tell us anything new, of course!). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 29 09:16:31 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 10:16:31 +0100 Subject: [Doc-SIG] going awry In-Reply-To: <3AC2F08C.6B1D0E3B@lemburg.com> Message-ID: <002101c0b830$f18af290$f05aa8c0@lslp7o.int.lsl.co.uk> M.-A. Lemburg wrote: > Please let me get this straight: as far as I understood Guido's > post, he only mentioned that rich text markup didn't work in out > in his projects -- he never outruled rich text markup for general > use, so I suspect all this confusion to be based on a > misunderstanding. Guido said: > PS. Don't spend too much time trying to make StructuredText or some > variation thereof work. In my experience with systems that use ST > (like ZWiki), it sucks. There basically are two options I like: > nicely laid out plain text, or a real markup language like Latex or > DocBook. That seems to be a downer for anyone trying to produce a docstring markup format. Of course, I think he has been soured by one particular implementation of something that wasn't what we were proposing (and given he seems to like MoinMoin which has an even more ad-hoc approach to text and what it "means", I don't *quite* understand why he's so down on even STClassic). > Ideally, there should be an interface for extracting > information from single doc-strings or maybe even modules which > then lets everybody plug in their own favourite doc-string parser. > > We already have tons of different auto-doc tools out there in > the Python universe -- the problem with most of them is that they > do not allow for parser plugins. This should change, IMHO. HappyDoc seems to be the leader here - I keep mentioning it partly because it seems to be under active development, partly because the author is (at least in principal) interested in the result of what we're doing, and partly because it aims to use plugins for both parsing the text and producing the output. But there is a problem with *recognising* what markup is used in what docstring, if there isn't one standard! Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 29 09:16:48 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 10:16:48 +0100 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: <200103281544.f2SFi2p01299@gradient.cis.upenn.edu> Message-ID: <002201c0b830$fb538670$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > Tibs seems to have this strange notion that "real" markup > languages don't use the same characters for left and right > delimiters.. :) Erm, yes, now you point it out I clearly had my stupid hat on (I wish I could lose that damn thing, it's so embarrassing). > I think that what makes delimiters > in ST seem like not-real-markup is that they are context- > dependant. E.g., "'" in the middle of a word is different > from "'" at the beginning of a word. Well, personally (despite anything I might have said before now) I'm going to start declaring that the ST family *is* real markup (thus defining "real" appropriately, of course). I've begun to think that otherwise it sounds silly. (some of you may want to skip the rant that follows to see more interesting stuff - I'll put a '****** BACK TO NORMAL ******' delimiter at the end so you can scan down...) What I'm clearly striving after is some way of describing what makes ST, etc., different. Thinking about this, we have: * The SGML/XML family * The TeX family * The Runoff family (including things like * The Pod family OK. The SGML family originated in a need to markup data as to its *meaning*, pure and simple. This later got spread to trying to use the meaning of a term to decide how to present it (which gives us HTML, sort of), and that becomes a slippery slope. The TeX family originate in the need to drive the precise typesetting of particular parts of the text, whilst producing good general, predictable typesetting for the rest of the text. It is important to remember that when using a TeX-related tool, the *intention* is that if it doesn't look good when formatted, then it should be rewritten (and indeed, that may mean writing different words to say the same thing). Because the meaning of a term often drives how it is to be typeset (especially in maths, it's original target), the use of TeX for semantic markup arises. The Runoff family was a simpler variant on the TeX idea, which wanted to produce computer manuals, and so on. There's generally less control over meaning, more interest in presentation. It's not clear to me if troff and so on belong to the TeX family or the Runoff family. The Pod family is, maybe, if it exists, the family of marking up docstrings. Edward Welbourne has talked about this in an earlier email. Basically, the aim is to produce something more useful than plain text (but not of a quality to stop a technical documentor wincing), leaving the original, marked up, text still useful *as such*. Eddy also comments that if someone using (in his comment, ST) is spending too much time worrying about markup, then they're not spending enough time working on more important things. Both the TeX family and the SGML family care about formalisms, a lot. They each have their own elegances which they are striving for. The Runoff family hasn't *heard* of elegance. And the Pod family are after pracicality. I think *we* are *not* in the TeX or SGML families. We are in the "pragmatic solution to a specific problem" space, and if formalism helps with that, then that's a Good Thing, but we shouldn't strain after theoretical purity lest we stray from practical usefulness (heh, I've been pulled up on the list in the past for exactly that). Sorry - back to the normal argument again... ****** BACK TO NORMAL ****** > Let's make all of our delimiters into real delimiters, > that can only be used for delimiting (or maybe also for > bullets, in the case of '*'). We could switch our "literal" > delimiter to "`". So then we would have the following > reserved characters, that may not appear in text without > being quoted somehow: > '<' left delimiter for URLs > '>' right delmiter for URLs > '#' delimiter for inlines > '`' delimiter for literals > '*' delimiter for emph, maybe for strong. > '::' marker for literal regions I hadn't thought of using backtick as a literal delimiter. In the context of docstrings, I can't see why it wouldn't work - hmm, this is a `literal` - yep, that works for me (does the resonance with Python backtick work?). It frees up both sorts of "normal" quote, which is good, and only inconveniences people like Eddy who insist on typing `both sorts of single quote' (TeX users, the lot of them). And it mean I can type "'cos" without worrying (or 'plane or 'phone if I want to appear old-fashioned). And if those *are* the delimiters, then it *would* work to expect them to be quoted when they occurred - neat. Just goes to prove why we keep Edward around on the list (please add a here). Does that mean we allow things like "there is a hard-space` `here"? It would be quite a neat thing to allow... > Then the only context-dependant characters that remain would > be start-list-item characters.. And if we wanted to, we could > use '* ' at the beginning of any list item, since it's > reserved anyway... something like: > > * this is an unordered list item > *1. this is an ordered list item > > Well.. I'm not sure whether we'd want to do that or not.. As I say elsewhere, this was considered in an earlier round, and in the end dropped. Personally, I think we're doing OK with the list forms we already had. > Does this sound like a reasonable direction to go? Well, I like it. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 29 09:16:52 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 10:16:52 +0100 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: <200103281659.LAA09792@cj20424-a.reston1.va.home.com> Message-ID: <002401c0b830$fdd85c90$f05aa8c0@lslp7o.int.lsl.co.uk> Guido van Rossum wrote (in response to Edward Loper): > > So then we would have the following > > reserved characters, that may not appear in text without > > being quoted somehow: > > '<' left delimiter for URLs > > '>' right delmiter for URLs > > '#' delimiter for inlines > > '`' delimiter for literals > > '*' delimiter for emph, maybe for strong. > > '::' marker for literal regions Hmm. Using backtick for literals might work quite well - what was ST's reason for not so doing, I wonder? > > Yuck. Most of these (except '::') are quite commonly used for other > purposes, and occur frequently in examples. And the problem with that is? > I prefer markup languages with very few special characters, > e.g. a GNU doc standard whose name I don't recall texinfo > which only uses @; or Perl's POD, which seems to get > away with making only a letter followed by '<' special. Yes, but they are not applicable by the "heavyweight markup won't fly" rule (which has been a principle of the Doc-SIG since 1997, and which you yourself defended, and which I used to oppose until it was explained Very Gently and Lots of Times to me why it was important). Texinfo (and there are other more modern examples) is still "formal markup to produce a document", where the markup has equal status with the text, and is expected to intrude. People will not want to write it in docstrings. So we'd lose. Pod is used successfully in the Perl world, and is a clear winner there. I find it intensely unreadable, as a lightweight format. One of the precepts of the whole Doc-SIG/docstring thing has been that "marked up" text must be readable *as text*. I'll say again what I seem to keep saying recently - that means that email is a sensible sort of model. If we can successfully parse something close to what people type in email, then we're onto a winner, in terms of getting people to use it. > Latex has at least three special characters ('\', '{', '}'), > and in some contexts more, and that's already a pain. > XML with '<' and '&' is borderline for me. We already have existing dictarotial fiat (first in 1997, reiterated by you again recently) against LaTeX and SGML/HTML/XML. That's a Good Thing, since the Doc-SIG as a whole has (each time round the loop) agreed that all of these are non-flyers for docstring markup. Their individual deficiencies (if so they be - that's a matter for argument elsewhere) are thus not relevant. > > Then the only context-dependant characters that remain would > > be start-list-item characters.. And if we wanted to, we could > > use '* ' at the beginning of any list item, since it's > > reserved anyway... something like: > > > > * this is an unordered list item > > *1. this is an ordered list item > > This is OK, although I like the single hyphen form better. There was a proposal last time round the loop to start all list item "sequences" with a special character (debate obviously ensued on which). It was dropped as a proposal (I can't remember which side of the debate I started out on - Doc-SIG has had a history of changing my mind towards the consensus by reasoned debate - don't you just hate it when that happens?). On the whole, I oppose it now. It makes it easier for a parser, and much harder for a human being, to write text. > > Well.. I'm not sure whether we'd want to do that or not.. We > > may be happy with just using '1.' and assuming that no one will > > start a line with a number that ends a sentence.. > > That was ST's the original sin. Is it a sin? I don't believe that you will get a markup system (*whatever* its conventions) that doesn't have *some* nooks and crannies where the user may not type. And if we're worried about (important, yes) fringe cases like that, why not make the implementation (note, not the spec) able to give a warning if it looks like the user might have done that (after all, ending a sentence, in *most* cases, can be spotted due to puntuation, so it should, often, be feasible). > I can't endorse this yet. I am worried that you, Guido, are coming into a debate which you have not participated in (note - *that* is not a criticism - there are other important things I'd like you to have been spending your time on) and putting down some ground rules which *appear* to contradict group-wisdom, as derived over the years. I'm a bit uncomfortable with having to attempt to "channel" the results of that, given I tend to be opinionated anyway, but even so. The Doc-SIG has had a disturbing habit of getting *very close* to a product, and then just petering out. This seems to partially correlate to the aftermath of a Spam meeting (frustrating if one couldn't be there), although for entirely different reasons each time, I believe (i.e., that's hopefully a red herring). I'd be very interested to know what you consider your "sticking points" on this to be - it may be that they are nothing we would worry about, it may be that they are issues we've already argued around in the past. For one thing, I'd appreciate *someone* explaining to me, slowly and with illustrations, just what is wrong with having context-sensitive markup *in docstrings* (not in abstract large documents marked up for typesetting (a la TeX), not in data specifications marked up for detailed content retrieval (a la SGML), but in docstrings marked up for humans to read the markup as text, and for software to retrieve some extra information for slightly improved presentation and for slightly improved information extraction). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 29 09:16:55 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 10:16:55 +0100 Subject: [Doc-SIG] going awry In-Reply-To: Message-ID: <002501c0b830$ff8ee630$f05aa8c0@lslp7o.int.lsl.co.uk> Ken Manheimer wrote: > Darn. We've had a number of occasions where the doc-sig has > launched into an effort to formulate doc string conventions, > and took a turn to invent a new language - which gets lost in > the ether. You seemed to be getting some good progress on > fixing the flaws in an existing language - structured text > - but it sounds eerily like you're heading towards throwing > that out the window, and inventing a new language. I think > that's a shame. I make this the third time round. It normally falls apart soon after a Spam meeting, which is *very* frustrating for those of us who can't get to them (and were involved in the debate that seemed to be so productive just before the meeting). > I think a large part of guido's objections to structured text > have to do with battling painful implementation bugs, part to > do with lack of predictability, and part do with an expectation > that rich-text markup style is going to take over the world, > even day-to-day communications. I tend to agree. I'm also disturbed that we seem to have to rehash some of the same arguments each time round the loop - it gets rather wearing. Although Guido's points are undoubtedly sensible, they are *also* points that have been made at least twice before. It doesn't help that I don't understand (myself) why people object to context sensitivity in markup in something like ST - what on earth is wrong with a single quote in the middle of a word being different than a single quote at the start of a word, or just before punctuation? We're all good at reading text - that means we don't even *see* such constructs AS SUCH - they're part of the "scanning interface" we run over lines. I mean, a person presented with 'this isn't difficult' doesn't have a particular problem with discerning that the middle quote is different than the others, and whilst I wouldn't propose *allowing* that as a quoted string in our format, it's nastier than the fringe cases people *are* worrying about. > I think the implementation-specific problems can be fixed by > the efforts we were seeing. There are some specific things about ST that *would* be nice to fix, and being free to do that (by dictatorial fiat) is a Good Thing. But I think throwing out the whole thing is not - it's been 5 years, dammit. > (Perhaps i'm overstating it - maybe what's happening now is more about > trimming down from a successful example, which should not be > near as prone to getting off track.) Maybe. I suspect this week will tell. > I think the expectation for use of rich-text markup style is > misguided. It's meant to look like the sort of thing that one already sees people typing in email, to my mind. That means that '*' is a natural character for emphasis, a quote (of some sort) needs using for quoting (and that really means single quote, since it's less used for speech), list items need to *look* like list items (although there's some freedom for playing with that), and so on. > I hope, if you do try to invent a new language, you'll > exploit some of the economies and principles that > structured text has demonstrated... Personally, I don't think ST is far off. I think that it has been trapped by some early assumptions - big deal. More in other messages... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 29 09:16:50 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 10:16:50 +0100 Subject: [Doc-SIG] Formalizing ST In-Reply-To: Message-ID: <002301c0b830$fcafe220$f05aa8c0@lslp7o.int.lsl.co.uk> Peter Funk wrote: > Last weekend I installed MoinMoin 0.8 here on a Server in our > companies intranet and played around with the markup. Wiki > markup contains some clever ideas but IMO this is not really > intuitive markup useful for Python inline doc strings. I've read the markup documentation on several Wikis (including CLisp, which is fascinating), and none of them are interested in human readable markup - they're all really interested in presenting web pages. > I would prefer to use indentation to markup different levels (borrow > this idea from ST) and use simple underlining for marking single > lines as headings: Indentation for structure is contentious with many people, and whilst it *sounds* like a good idea (especially to Python people) many object to ending up with the bulk of their text indented. > However the url detection without requiring '<' and '>' > delimters around the http:// ... string is a nice feature > of MoinMoin markup. You haven't been following the me and Edward Loper (and Edward Welbourne) flurry of emails over recent weeks, have you? The trouble with finding *bare* URIs in a text document written by humans with punctuation is that, in the general case, you can't do it. For instance, a URI is allowed to end with a dot ('.'). So how do you cope with a sentence that ends with http://www.tibsnjoan.co.uk/. Is that last do part of the URI or not? There are other issues about what can go inside the URI, as well. Yes, people can come up with ad-hoc solutions (docutils/stpy.py works reasonably well), but they are ad-hoc and not guaranteed to work. This disturbs some people (I'm not *too* fussed, but then I'd err on the side of detecting *too many* URIs, I think, which I know would upset some people). The *only* safe way (and note that this is an option in MoinMoin also) is to delimit the URIs with some mechanism, and '<..>' is at least a fairly traditional solution. > Ping has implemented something similar in pydoc already > and this works just fine. See above - it's "modulo just fine" I'm afraid (Ping is happy with approximate solutions that find too many instances - somewhat more than myself - so *of course* pydoc does what it does (and of course it should)). > I have a similar feeling with the email address recognition Erm - email addresses should be presented as URIs, honest. > About lists and numbered lists I'm still not sure what I would like. > I bullet item list (LaTeX itemize) seems to be enough for most cases. No, that is not sufficient. There are too many of us who *want* (no, *need*) more sorts of list (believe me, I've been using a too-simple internal markup tool for C function header comments for years, and it has only one type of list, delimited by '@' - it's not sufficient - people end up writing lists out "by hand", which rather circumvents the point). > A few days ago Guido gave a similar statement. I'm not sure he exactly said that, but if he did, he was wrong (it *is* possible, he just normally uses the time machine to go back and alter the records after he changes his mind). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Thu Mar 29 11:08:37 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 12:08:37 +0100 Subject: [Doc-SIG] Formalizing ST In-Reply-To: Message-ID: <002c01c0b840$9a88acc0$f05aa8c0@lslp7o.int.lsl.co.uk> Peter Funk wrote: > Tony J Ibbs (Tibs) schrieb: > > You haven't been following the me and Edward Loper (and Edward > > Welbourne) flurry of emails over recent weeks, have you? > > Yes, I did, but refused to jump in: Admirable restraint! (and brave man for following it all) > I believe it was somewhat theoretic. Well, a mixture of theory and pragmatism, jumbled up to be hard to distiguish. > In practice I have never ever seen a URL ending with a > period. Please give real world evidence of some useful URL. Oh, I'm not convinced that we couldn't manage with the ad-hoc use of REs that I, Ka-Ping Yee, and others have been managing with. But Edward *is* unhappy with it, and that has become significant to me as he has shown good "design sense" in other places. Obvious URIs that fail the test are "." and ".." (both perfectly legal "local" references within an HTML document, and certainly possible things for someone to want to use in a docutils context, I'd have thought - particularly in a package's __init__.py docstring). > IMO it wouldn't hurt, if detection fails in this case. The problem isn't with detection *failing*, it's partly to do with excessive detection (i.e., the pragmatic schemes generally try to over-identify URIs, just in case), but *mainly* due to a worry about explaining to a user what they can type that will work, before they type it. An explanation that goes: "type your URI, but if it ends in one of these characters, you'll have to escape it, or something, and by the way *this* ad-hoc list of characters inside your URI also needs escaping" doesn't seem to be attractive to Edward (put that way, who can blame him), whereas it's very easy to say: "if you want your URI to be recognised, highlighted as such, and with a link if the application supports it, just put '<' and '>' round it, like you're used to seeing in email headers" and expect people to remember it. We might even be able to allow *spaces* in a URI with the '<..>' scheme, which is seriously neat. > I don't suggest to forbid the '<' and '>' delimiters. > Just make them optional. Edward and I will both grumble at it being optional - he for formal reasons, and I 'cos its wasted mind space remembering optional things when you don't need to, and it gives you the worst of both worlds. > This will work just fine in at least 99.8 % of all cases. Hmm. I'd vote for either ad-hoc recognition or '<..>', and Edward makes a good case for using the latter if we're starting "from scratch". (Note that, despite your attempts to throw oars into our works (!), I'm glad to see that at least one of the disputative regulars from last time round the Doc-SIG loop is listening - please feel free to correct my "historical comments" if you think I'm getting them wrong.) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From pf@artcom-gmbh.de Thu Mar 29 11:13:21 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Thu, 29 Mar 2001 13:13:21 +0200 (MEST) Subject: On ordered Lists (was RE: [Doc-SIG] Formalizing ST) In-Reply-To: <002301c0b830$fcafe220$f05aa8c0@lslp7o.int.lsl.co.uk> from "Tony J Ibbs (Tibs)" at "Mar 29, 2001 10:16:50 am" Message-ID: Hi, I wrote: > > I bullet item list (LaTeX itemize) seems to be enough for most cases. [...] > > A few days ago Guido gave a similar statement. Tony J Ibbs (Tibs) replied: > I'm not sure he exactly said that, but if he did, he was wrong (it *is* > possible, he just normally uses the time machine to go back and alter > the records after he changes his mind). I would love to watch the time machine altering the doc-sig archive on python.org and make this email non existent in a parallel universe. :-) I meant the following 2 EMails written by Guido: In http://mail.python.org/pipermail/doc-sig/2001-March/001584.html Guido replied on an email from me: > > I think, a description list can be dropped alltogether. > > Yes! They are darn ugly in HTML anyway. > > > At least for the time being a bullet list will be enough. > > Agreed. Later in http://mail.python.org/pipermail/doc-sig/2001-March/001595.html he wrote as a reply to mailto:edloper%40gradient.cis.upenn.edu: > > Well.. I'm not sure whether we'd want to do that or not.. We > > may be happy with just using '1.' and assuming that no one will > > start a line with a number that ends a sentence.. > > That was ST's the original sin. IMO these are pretty clear statements. If INDENT and DETENT tokens are part of a upcoming EBNF docstring grammar, I think it might be possible to come up with rules for ordered and descriptive lists later on, which will not suffer from ST patterns which trigger in error. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From tony@lsl.co.uk Thu Mar 29 12:23:35 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 29 Mar 2001 13:23:35 +0100 Subject: On ordered Lists (was RE: [Doc-SIG] Formalizing ST) In-Reply-To: Message-ID: <003101c0b84b$137b7590$f05aa8c0@lslp7o.int.lsl.co.uk> There seem to be a lot of follow-ups in the headers - I've left them intact just in case. Apologies to anyone who would prefer I hadn't... Peter Funk wrote: > I meant the following 2 EMails written by Guido: In > http://mail.python.org/pipermail/doc-sig/2001-March/001584.html > Guido replied on an email from me: > > > I think, a description list can be dropped alltogether. > > > > Yes! They are darn ugly in HTML anyway. Sigh. Judging a construct by how IE and Netscape present it is not a very good way to do it. Full stop. (analogy: Renoire artistically judging a subject by how my five-year-old renders it) (I *assume* that's what he meant - if he actually meant what he *said*, which is that `.. ..` is ugly, then I really despair. 'cos, like, who could care - only people like me actually *read* HTML). Besides, we've no requirement *at all* to accept that presentation - the descriptive list is the *internal* construct, how that gets turned into (for instance) HTML is in our control (well, the tool writer's control), and it's only after that the browser gets its hands on it. Even before style sheets this was a valid way around the problem (one could, for instance, use tables, or bullet lists with the description formatted as the first paragraph - you get the idea). And with style sheets the document creator gets a *lot* of latitude, even if a standard construct like `` *is* used. As I believe I've said elsewhere, I think Guido must have been having a bad week - it doesn't sound like the BDFL I've learnt to trust to miss the abstraction and focus on the (particular) implementation. Dammit, even in his "style sheet" (why won't he finish that?) he uses a descriptive list! (if it looks like a fish and walks like a fish, it can ride a bicycle like a fish, or something like that.) > > > At least for the time being a bullet list will be enough. > > > > Agreed. No. And I will keep fighting this, as I'm sure will other people (other people, anyone, please). After all, that's why we have the SIG! Guido is allowed to be human. He is allowed to be wrong. He is allowed to be *misinformed*. And he is definitely allowed to be convinced of a different opinion. He just gets the final overriding vote (on a PEP - which we haven't produced yet), and it is an item of faith that he only uses that "in extremis". > Later in > http://mail.python.org/pipermail/doc-sig/2001-March/001595.html > he wrote as a reply to mailto:edloper%40gradient.cis.upenn.edu: > > > Well.. I'm not sure whether we'd want to do that or not.. We > > > may be happy with just using '1.' and assuming that no one will > > > start a line with a number that ends a sentence.. > > > > That was ST's original sin. Again, sigh. This is the same (beloved, of course) person who was proposing we look at MoinMoin for ideas, which (for good reasons in context) uses a *horrible* hodge podge of markup mechanisms. And he castigates the ST family for this. (and it's by far not the *worst* "sin" in ST's books, surely) Whatever markup scheme we adopt, I can guarantee you it will have infelicities - especially if it "reads" like more-or-less natural text. People will have to know about those infelicities. As I said, a bad week. > If INDENT and DETENT tokens are part of a > upcoming EBNF docstring grammar, I think it might be > possible to come up with rules for ordered and descriptive > lists later on, which will not suffer from ST patterns > which trigger in error. I'm not convinced, myself, because we probably *can't* mandate exactly how people lay out paragraphs. For instance, consider the variation in:: Some text. This is more of the same. Some text. This is more of the same. Some text. This is more of the same. and (addressing lists themselves) in: Some text. 1. This is a list and I continue here. Some text. 1. This is a list and I continue here. Some text. 1. This is a list and I continue here. Some text. <> I don't see how we can stop people doing any of those (I bet if our format *tries* it will be either ignored or not used "properly"). That's one of the reasons I advocate ignoring indentation within paragraphs. The *only* way round that would be to require blank lines in front of list items, and that's a no-no for other reasons (well, we discussed that last time round the Doc-SIG loop). Besides, if people *really* find this a problem, we will just need to make sure that the tool implementing the spec looks out for possible problem cases, and that it can *warn* the document writer they may have a problem. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From fdrake@cj42289-a.reston1.va.home.com Thu Mar 29 13:01:26 2001 From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake) Date: Thu, 29 Mar 2001 08:01:26 -0500 (EST) Subject: [Doc-SIG] [development doc updates] Message-ID: <20010329130126.C3EED2888E@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ For Peter Funk: Removed space between function/method/class names and their parameter lists for easier cut & paste. This is a *tentative* change; feedback is appreciated at python-docs@python.org. Also added some new information on integrating with the cycle detector and some additional C APIs introduced in Python 2.1 (PyObject_IsInstance(), PyObject_IsSubclass()). From fdrake@acm.org Thu Mar 29 13:01:32 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 29 Mar 2001 08:01:32 -0500 (EST) Subject: [Doc-SIG] f(...) vs. f (...) inconsistency In-Reply-To: References: Message-ID: <15043.12844.679546.355978@cj42289-a.reston1.va.home.com> Peter Funk writes: > this is a very tiny issue, but it bugged over the years: > > In his Styleguide Guido wrote: > """I **hate** whitespace in the following places: [...] > Immediately before the open parenthesis that starts the argument > list of a function call, as in spam (1). Always write this as spam(1).""" > I agree to 100% with this. > On the other hand I very often cut'n'paste between the library reference manual > pages and an open editor window. Unfortunately in the library reference > there are spaces between function names and the opening parenthesis, which > I always have to remove manually. > > How come? Should the library documentation be fixed in this regard? I think that's a historical artefact, but it may make sense to keep it to ease readability. I'm publishing a version of the development documentation that makes this change, and am requesting feedback to python-docs. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From klm@digicool.com Thu Mar 29 16:10:24 2001 From: klm@digicool.com (Ken Manheimer) Date: Thu, 29 Mar 2001 11:10:24 -0500 (EST) Subject: [Doc-SIG] using the same delimiter on the left and right.. Message-ID: Tibs wrote: > Guido van Rossum wrote (in response to Edward Loper): > > > Well.. I'm not sure whether we'd want to do that or not.. We > > > may be happy with just using '1.' and assuming that no one will > > > start a line with a number that ends a sentence.. > > > > That was ST's the original sin. > > Is it a sin? I don't believe that you will get a markup system > (*whatever* its conventions) that doesn't have *some* nooks and crannies > where the user may not type. And if we're worried about (important, yes) This may have to do with an agregious wart in STclassic - the implementation is ridiculously too loose about what it accepts for the ordered list cue. It does *not* constrain to digits, nor to single characters! Here are two examples that are translated into ordered list elements:: Huh. This isn't *supposed* to be an ordered list element! Mr. Ken Manheimer would proudly like to present another vapid example. I can understand how unpleasant little bugs like this would inspire loathing and fear in the hearts of STclassic users. You actually have to change your sentence structure to work around it! (Perhaps i should have spent more time fixing STclassic bugs when i was working on WikiForNow (some revisions of ZWiki, which uses STclassic). However, time was extremely limited, and STNG is in the wings, so effort expended on STclassic seemed unworthwhile. *When* STNG is coming out of the wings is another story, though - with all the important things pending for Zope and for zope.org, it hasn't been high enough priority - the devil's bargain, etc. Sigh.) Still, some may just not consider punctuation-style cues for markup to be acceptable. That would be a shame - i think for situations like docstrings and brief, day-to-day content, such limited-scope, dirt-simple markup is the right way to go, if implemented well... Ken klm@digicool.com From guido@digicool.com Thu Mar 29 16:28:03 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 11:28:03 -0500 Subject: [Doc-SIG] What's important in a docstring markup language? In-Reply-To: Your message of "Thu, 29 Mar 2001 10:08:48 +0100." <001e01c0b82f$dd7af350$f05aa8c0@lslp7o.int.lsl.co.uk> References: <001e01c0b82f$dd7af350$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291628.LAA18924@cj20424-a.reston1.va.home.com> > > 9. URLs > > Quite important if this is meant to be "joined up" documentation (to use > a horrible buzz phrase our politicians seem addicted to). It is very > important to be able to reference documentation elsewhere, and having > them "clickable" in derived formats that support that ability is > important too. IMO, URLs don't need any special markup. They can just be recognized in the text and automatically highlighted. Lots of tools processing plain text do this (including the FAQ wizard, which has a trick or two to make this work reliably even when there's punctuation following the URL). --Guido van Rossum (home page: http://www.python.org/~guido/) From dgoodger@atsautomation.com Thu Mar 29 16:46:24 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Thu, 29 Mar 2001 11:46:24 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. Message-ID: Tony J Ibbs (Tibs) wrote: > Edward D. Loper wrote: > > We could switch our "literal" delimiter to "`". ... > > I hadn't thought of using backtick as a literal delimiter. Just a little reminder here, guys. Although I haven't had time (or energy) to participate in the recent voluminous discussions (I've been lurking), I did present some carefully thought-out arguments about the above topic (and many others) back in November. Have you read them yet? (Based on several recent posts, including the ones referenced above, it seems you haven't. Nudge, nudge. :) See: - A Plan for Structured Text http://mail.python.org/pipermail/doc-sig/2000-November/001239.html - Problems With StructuredText http://mail.python.org/pipermail/doc-sig/2000-November/001240.html - reStructuredText: Revised Structured Text Specification http://mail.python.org/pipermail/doc-sig/2000-November/001241.html Specifically, backticks ("`") are good for inline literals. Single quotes ("'") are bad, because we use 'em too much in all contexts (apostrophes, Python strings, prose quotations and nested quotations [Bruce said, "'Hot enough to boil a monkey's bum,' Her Majesty said, and smiled quietly to herself."], and the British seem to prefer them over double-quotes [in novels at least]). Hash marks ("#") are unbearably ugly. Eddy W.'s statements notwithstanding, in an agile (new P.C. term for "lightweight"; see http://www.agilealliance.org) markup scheme, we don't need all four of (inline, block) x (alien text, Python code); I think that's up to the tool to deal with. /DG From guido@digicool.com Thu Mar 29 16:48:36 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 11:48:36 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Thu, 29 Mar 2001 10:16:52 +0100." <002401c0b830$fdd85c90$f05aa8c0@lslp7o.int.lsl.co.uk> References: <002401c0b830$fdd85c90$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291648.LAA18948@cj20424-a.reston1.va.home.com> > > Yuck. Most of these (except '::') are quite commonly used for other > > purposes, and occur frequently in examples. > > And the problem with that is? That they will frequently need to be escaped in order to prevent special interpretation. Note that an escape character was absent from the list -- that's a big mistake, I think! > > I prefer markup languages with very few special characters, > > e.g. a GNU doc standard whose name I don't recall > > texinfo > > > which only uses @; or Perl's POD, which seems to get > > away with making only a letter followed by '<' special. > > Yes, but they are not applicable by the "heavyweight markup won't fly" > rule (which has been a principle of the Doc-SIG since 1997, and which > you yourself defended, and which I used to oppose until it was explained > Very Gently and Lots of Times to me why it was important). Well, after using ST, I'm not so sure I agree with that rule any more. I think HTML is too heavy, but I think reserving a dozen or so characters for special purposes is also wrong. > Texinfo (and there are other more modern examples) is still "formal > markup to produce a document", where the markup has equal status with > the text, and is expected to intrude. People will not want to write it > in docstrings. So we'd lose. But isn't this exactly what Javadoc does? > Pod is used successfully in the Perl world, and is a clear winner there. > I find it intensely unreadable, as a lightweight format. I haven't seen too much POD, so you may be right there. Is it worse than Latex? > One of the precepts of the whole Doc-SIG/docstring thing has been that > "marked up" text must be readable *as text*. I'll say again what I seem > to keep saying recently - that means that email is a sensible sort of > model. If we can successfully parse something close to what people type > in email, then we're onto a winner, in terms of getting people to use > it. Watch out though. As soon as you're getting into heuristics too much, our ways part. I want very clear, exact and predictable rules. > > Latex has at least three special characters ('\', '{', '}'), > > and in some contexts more, and that's already a pain. > > XML with '<' and '&' is borderline for me. > > We already have existing dictarotial fiat (first in 1997, reiterated by > you again recently) against LaTeX and SGML/HTML/XML. That's a Good > Thing, since the Doc-SIG as a whole has (each time round the loop) > agreed that all of these are non-flyers for docstring markup. Their > individual deficiencies (if so they be - that's a matter for argument > elsewhere) are thus not relevant. Sure. Though I've got a feeling that I'm disagreeing with "the doc-sig as a whole" a lot. Maybe I should just withdraw (again) from this whole discussion and let you all decide what you like, as long as it doesn't have to be used for the standard library? > > > Then the only context-dependant characters that remain would > > > be start-list-item characters.. And if we wanted to, we could > > > use '* ' at the beginning of any list item, since it's > > > reserved anyway... something like: > > > > > > * this is an unordered list item > > > *1. this is an ordered list item > > > > This is OK, although I like the single hyphen form better. > > There was a proposal last time round the loop to start all list item > "sequences" with a special character (debate obviously ensued on which). > It was dropped as a proposal (I can't remember which side of the debate > I started out on - Doc-SIG has had a history of changing my mind towards > the consensus by reasoned debate - don't you just hate it when that > happens?). > > On the whole, I oppose it now. It makes it easier for a parser, and much > harder for a human being, to write text. I think whitespace (a blank line and/or indentation) should be enough to recognize the start of a list. > > > Well.. I'm not sure whether we'd want to do that or not.. We > > > may be happy with just using '1.' and assuming that no one will > > > start a line with a number that ends a sentence.. > > > > That was ST's the original sin. > > Is it a sin? I don't believe that you will get a markup system > (*whatever* its conventions) that doesn't have *some* nooks and crannies > where the user may not type. And if we're worried about (important, yes) > fringe cases like that, why not make the implementation (note, not the > spec) able to give a warning if it looks like the user might have done > that (after all, ending a sentence, in *most* cases, can be spotted due > to puntuation, so it should, often, be feasible). Well, it would be allright if it only recognized numbers after a blank line. It's a pain if it latches on any "^\d+\." in the middle of a text block, because (in my experience) that's never a numbered item, it always just happens to be a sentence ending in a number. > > I can't endorse this yet. > > I am worried that you, Guido, are coming into a debate which you have > not participated in (note - *that* is not a criticism - there are other > important things I'd like you to have been spending your time on) and > putting down some ground rules which *appear* to contradict > group-wisdom, as derived over the years. I'm a bit uncomfortable with > having to attempt to "channel" the results of that, given I tend to be > opinionated anyway, but even so. Well, you (as a group) asked me my opinion, which I gave. If you don't like it, fine, I'll bail out again, I *do* have other things to do. Also note that I repeatedly requested to see the spec you (again as a group) had arrived at, and nobody has pointed me to it. Given that the doc-sig has ben going around in circles since 1997, I worry that it's never going to reach a conclusion -- with or without my involvement. > The Doc-SIG has had a disturbing habit of getting *very close* to a > product, and then just petering out. This seems to partially correlate > to the aftermath of a Spam meeting (frustrating if one couldn't be > there), although for entirely different reasons each time, I believe > (i.e., that's hopefully a red herring). Lots of things get a jolt of energy at a Python conference (can we stop calling them spams?) and then peter out. The types-sig has seen this phenomenon too. I guess it's because real life takes over after a while. > I'd be very interested to know what you consider your "sticking points" > on this to be - it may be that they are nothing we would worry about, it > may be that they are issues we've already argued around in the past. Show me your spec and I'll review it. You can't expect me to lay out ground rules without knowing where your thinking is going. > For one thing, I'd appreciate *someone* explaining to me, slowly and > with illustrations, just what is wrong with having context-sensitive > markup *in docstrings* (not in abstract large documents marked up for > typesetting (a la TeX), not in data specifications marked up for > detailed content retrieval (a la SGML), but in docstrings marked up for > humans to read the markup as text, and for software to retrieve some > extra information for slightly improved presentation and for slightly > improved information extraction). I believe the problem is with the required preciseness of docstrings. Docstrings are not like email, where the reader can usually guess what you meant despite typos and transmission glitches. Imagine a docstring describing a regular expression-like language. Can you see the damage that could be done by inadvertently changing all double backslashes into single backslashes, or interpreting *...* as bold (hence dropping the *s)? There are lots of situations like this. (E.g. I recently noticed that Ping made some docstring a raw string because it contained examples involving \r and \n.) Every character counts, and so does every bit of whitespace -- at least sometimes, and the docstring processor can't be smart enough to always know when. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 17:05:04 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 12:05:04 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Thu, 29 Mar 2001 10:16:48 +0100." <002201c0b830$fb538670$f05aa8c0@lslp7o.int.lsl.co.uk> References: <002201c0b830$fb538670$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291705.MAA19008@cj20424-a.reston1.va.home.com> > The Runoff family was a simpler variant on the TeX idea, which wanted to > produce computer manuals, and so on. There's generally less control over > meaning, more interest in presentation. It's not clear to me if troff > and so on belong to the TeX family or the Runoff family. TeX and troff are both very old and have the same amount of control over the typesetter. However, the TeX *language* is more flexible (maybe too flexible) and hence could more easily beget LaTeX, which is a member of the SGML family: it cares about meaning, not markup. There are languages like that in the troff family too: the -man macros (generally try to) specify meaning, not markup. Troff lost to TeX because it was too concerned with machine efficiency in the 16-bit days, and restricted its language to 2-character identifiers. > The Pod family is, maybe, if it exists, the family of marking up > docstrings. Edward Welbourne has talked about this in an earlier email. > Basically, the aim is to produce something more useful than plain text > (but not of a quality to stop a technical documentor wincing), leaving > the original, marked up, text still useful *as such*. Eddy also comments > that if someone using (in his comment, ST) is spending too much time > worrying about markup, then they're not spending enough time working on > more important things. Objection. I'm not "worrying too much about markup". I'm worrying quite a lot, based upon hard-earned experience, that the heuristics of the ST family introduce unwanted markup and drop characters that are essential for the documentation. > > Let's make all of our delimiters into real delimiters, > > that can only be used for delimiting (or maybe also for > > bullets, in the case of '*'). We could switch our "literal" > > delimiter to "`". So then we would have the following > > reserved characters, that may not appear in text without > > being quoted somehow: > > '<' left delimiter for URLs > > '>' right delmiter for URLs Unneeded -- URLs can be recognized easily, and we don't have to drop any characters when we add a link. > > '#' delimiter for inlines > > '`' delimiter for literals > > '*' delimiter for emph, maybe for strong. We only need one of these (emph or strong). > > '::' marker for literal regions We need an escape too. > I hadn't thought of using backtick as a literal delimiter. In the > context of docstrings, I can't see why it wouldn't work - hmm, this is a > `literal` - yep, that works for me (does the resonance with Python > backtick work?). >From a readability perspective, I'd prefer `symmetric quotes'. > It frees up both sorts of "normal" quote, which is > good, and only inconveniences people like Eddy who insist on typing > `both sorts of single quote' (TeX users, the lot of them). And it mean I > can type "'cos" without worrying (or 'plane or 'phone if I want to > appear old-fashioned). Maybe it would work if the quotes were left in the output text? That way at least if a stray backtick is mistaken for markup, it's still clear that there was a backtick in the docstring source. > And if those *are* the delimiters, then it *would* work to expect them > to be quoted when they occurred - neat. Just goes to prove why we keep > Edward around on the list (please add a here). > > Does that mean we allow things like "there is a hard-space` `here"? It > would be quite a neat thing to allow... This looks like horrible abuse to me. --Guido van Rossum (home page: http://www.python.org/~guido/) From Juergen Hermann" Message-ID: On Thu, 29 Mar 2001 11:28:03 -0500, Guido van Rossum wrote: >IMO, URLs don't need any special markup. They can just be recognized >in the text and automatically highlighted. Lots of tools processing >plain text do this (including the FAQ wizard, which has a trick or two >to make this work reliably even when there's punctuation following the >URL). +1 (actually, you kicked me in the right direction to improve MoinMoin's code in that respect ;). I think we should go the plain text route, with _conservative_ regexes (i.e. a sane implementation) and not too fancy markup (Tony's list). The main thing to consider in a first implementation is that we do not paint ourselves into a corner (like using too much markup characters that'll make it hard to keep the "plain readable text" idea). If people want STNG in docstrings, plug in a parser for it. On the problem of deciding what parser to use, I propose to add some hint on a = per module basis (mixing several docstring styles per module would be a = silly, unsuported idea). Either by a magic variable in the module, or a = magic comment, or some hint in the module's docstring. Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ From guido@digicool.com Thu Mar 29 17:08:56 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 12:08:56 -0500 Subject: [Doc-SIG] going awry In-Reply-To: Your message of "Thu, 29 Mar 2001 10:16:55 +0100." <002501c0b830$ff8ee630$f05aa8c0@lslp7o.int.lsl.co.uk> References: <002501c0b830$ff8ee630$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291708.MAA19032@cj20424-a.reston1.va.home.com> > It doesn't help that I don't understand (myself) why people object to > context sensitivity in markup in something like ST - what on earth is > wrong with a single quote in the middle of a word being different than a > single quote at the start of a word, or just before punctuation? We're > all good at reading text - that means we don't even *see* such > constructs AS SUCH - they're part of the "scanning interface" we run > over lines. I mean, a person presented with 'this isn't difficult' > doesn't have a particular problem with discerning that the middle quote > is different than the others, and whilst I wouldn't propose *allowing* > that as a quoted string in our format, it's nastier than the fringe > cases people *are* worrying about. Have you tried to use ST to document a language that happens to place a special meaning on most of the ST special characters? (Like ST itself. :-) It's horrid unless the rules are very clear and simple, and there's a really easy way to turn ST's heuristics off -- and not just in literal blocks (which are only half the solution). > > I think the implementation-specific problems can be fixed by > > the efforts we were seeing. > > There are some specific things about ST that *would* be nice to fix, and > being free to do that (by dictatorial fiat) is a Good Thing. But I think > throwing out the whole thing is not - it's been 5 years, dammit. You know, that *could* mean that the problem is simply intractable, and that we'd all do better by admitting that the only two real options are real plain text or real markup... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 17:20:06 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 12:20:06 -0500 Subject: [Doc-SIG] Formalizing ST In-Reply-To: Your message of "Thu, 29 Mar 2001 10:16:50 +0100." <002301c0b830$fcafe220$f05aa8c0@lslp7o.int.lsl.co.uk> References: <002301c0b830$fcafe220$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291720.MAA19114@cj20424-a.reston1.va.home.com> > Indentation for structure is contentious with many people, and whilst it > *sounds* like a good idea (especially to Python people) many object to > ending up with the bulk of their text indented. What worked against this specific ST feature is that in ZWikis, you end up editing sizeable documents in a text box in Netscape, which has no support for auto-indentation. > > However the url detection without requiring '<' and '>' > > delimters around the http:// ... string is a nice feature > > of MoinMoin markup. > > You haven't been following the me and Edward Loper (and Edward > Welbourne) flurry of emails over recent weeks, have you? > > The trouble with finding *bare* URIs in a text document written by > humans with punctuation is that, in the general case, you can't do it. > For instance, a URI is allowed to end with a dot ('.'). So how do you > cope with a sentence that ends with http://www.tibsnjoan.co.uk/. Is that > last do part of the URI or not? There are other issues about what can go > inside the URI, as well. Yes, people can come up with ad-hoc solutions > (docutils/stpy.py works reasonably well), but they are ad-hoc and not > guaranteed to work. This disturbs some people (I'm not *too* fussed, but > then I'd err on the side of detecting *too many* URIs, I think, which I > know would upset some people). The FAQ wizard uses a simple and sufficient rule, which almost never misfires: it scans up to whitespace, and then trims punctuation characters from the back. While URLs certainly *can* end in punctuation, I have never seen URLs that *did*. Invariably, a trailing period or comma is part of the sentence, not part of the URL. > The *only* safe way (and note that this is an option in MoinMoin also) > is to delimit the URIs with some mechanism, and '<..>' is at least a > fairly traditional solution. Which unfortunately means you would have to escape each < or > that was not meant to be a URL delimiter. These occur frequently in Python code samples (``if i < 10: print i'') but also, and I would say more frequently, in any documentation that describes XML or HTML samples. I find the ability to write " and are equivalent in HTML but not in XHTML" more important than the ability to mark URLs unambiguously, given the success rate of existing heuristics there. > > Ping has implemented something similar in pydoc already > > and this works just fine. > > See above - it's "modulo just fine" I'm afraid (Ping is happy with > approximate solutions that find too many instances - somewhat more than > myself - so *of course* pydoc does what it does (and of course it > should)). That's a new meaning of "modulo". :-) > > I have a similar feeling with the email address recognition > > Erm - email addresses should be presented as URIs, honest. Yeah, right. Tough luck getting people to add mailto: to their address. Be practical, and add a hyperlink to anything that looks like an email address -- if you don't eat any characters that were present in the source, soemwhat overzealous recognition won't hurt. > > About lists and numbered lists I'm still not sure what I would like. > > I bullet item list (LaTeX itemize) seems to be enough for most cases. > > No, that is not sufficient. There are too many of us who *want* (no, > *need*) more sorts of list (believe me, I've been using a too-simple > internal markup tool for C function header comments for years, and it > has only one type of list, delimited by '@' - it's not sufficient - > people end up writing lists out "by hand", which rather circumvents the > point). What exactly is lacking in that tool? Nested lists? We can do those. Numbered lists? We don't need autonumbered lists, so we can require that the numbers are already in the source. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 17:26:00 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 12:26:00 -0500 Subject: [Doc-SIG] Formalizing ST In-Reply-To: Your message of "Thu, 29 Mar 2001 12:08:37 +0100." <002c01c0b840$9a88acc0$f05aa8c0@lslp7o.int.lsl.co.uk> References: <002c01c0b840$9a88acc0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291726.MAA19161@cj20424-a.reston1.va.home.com> > Obvious URIs that fail the test are "." and ".." (both perfectly legal > "local" references within an HTML document, and certainly possible > things for someone to want to use in a docutils context, I'd have > thought - particularly in a package's __init__.py docstring). You can always append a "/" to URLs ending in "." or "..". In fact that's recommended practice anyway -- otherwise you incur an extra server roundtrip since most servers give you a 301 or 302 redirect with an appended slash if you give a directory URL without trailing slash -- this is to make relative URLs work. > > IMO it wouldn't hurt, if detection fails in this case. > > The problem isn't with detection *failing*, it's partly to do with > excessive detection (i.e., the pragmatic schemes generally try to > over-identify URIs, just in case), but *mainly* due to a worry about > explaining to a user what they can type that will work, before they type > it. > > An explanation that goes: > > "type your URI, but if it ends in one of > these characters, you'll have to escape > it, or something, and by the way *this* > ad-hoc list of characters inside your > URI also needs escaping" Practical URLs don't end in punctuation. Show me a website whose URLs do and I'll change my mind, but I bet you can't find one. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 17:42:07 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 12:42:07 -0500 Subject: On ordered Lists (was RE: [Doc-SIG] Formalizing ST) In-Reply-To: Your message of "Thu, 29 Mar 2001 13:23:35 +0100." <003101c0b84b$137b7590$f05aa8c0@lslp7o.int.lsl.co.uk> References: <003101c0b84b$137b7590$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291742.MAA19253@cj20424-a.reston1.va.home.com> > > Guido replied on an email from me: > > > > I think, a description list can be dropped alltogether. > > > > > > Yes! They are darn ugly in HTML anyway. > > Sigh. Judging a construct by how IE and Netscape present it is not a > very good way to do it. Full stop. (analogy: Renoire artistically > judging a subject by how my five-year-old renders it) Given that IE and Netscape are what 99% of our users use to view documentation, I don't see why this argument is rule out. Remember, practicality beats purity. You seem to argue for the purity side of things. And I don't believe the Renoir (sp!) reference is relevant at all. IE and NS are not your five-year-old. They are the front page of your city's newspaper. > Besides, we've no requirement *at all* to accept that presentation - the > descriptive list is the *internal* construct, how that gets turned into > (for instance) HTML is in our control (well, the tool writer's control), > and it's only after that the browser gets its hands on it. Even before > style sheets this was a valid way around the problem (one could, for > instance, use tables, or bullet lists with the description formatted as > the first paragraph - you get the idea). And with style sheets the > document creator gets a *lot* of latitude, even if a standard construct > like `` *is* used. Except that the problem is that typically there's *no* decent-looking way to present such lists. > As I believe I've said elsewhere, I think Guido must have been having a > bad week - it doesn't sound like the BDFL I've learnt to trust to miss > the abstraction and focus on the (particular) implementation. I can do without this particular abstraction. > Dammit, even in his "style sheet" (why won't he finish that?) Lack of time, like so many other things. > he uses a > descriptive list! (if it looks like a fish and walks like a fish, it can > ride a bicycle like a fish, or something like that.) Note that I don't need semantic mark-up for my descriptive lists -- the English language, punctuation and bulleted lists do all that I want. An argument against too much abstraction in the current discussion: the core idea of ST is that the source looks sufficiently like the output to be readable without any processing. I'd rather not have a tool that tries to extract an abstraction and lays it out completely different in another medium, because that means what I think looks right in plain text will suddenly look wrong on the user's screen. Giving the renderer too much freedom IMO makes it harder for the author to do the right thing. Really, I wish we could use WYSWIG for docstrings -- that would be so much better! But program text editors don't allow that yet... :-( > No. And I will keep fighting this, as I'm sure will other people (other > people, anyone, please). After all, that's why we have the SIG! So please write down exactly which forms of lists you want. > Guido is allowed to be human. He is allowed to be wrong. He is allowed > to be *misinformed*. And he is definitely allowed to be convinced of a > different opinion. I am so glad you aren't telling me what to think or do. > He just gets the final overriding vote (on a PEP - which we haven't > produced yet), and it is an item of faith that he only uses that "in > extremis". Well, actually, if I vote your PEP down, that doesn't have to stop you from using it anyway in your own code. And if you can convince enough other users to follow your conventions, I may be convinced. This is different than a language change, where I really *do* have the last word! > Whatever markup scheme we adopt, I can guarantee you it will have > infelicities - especially if it "reads" like more-or-less natural text. > People will have to know about those infelicities. Except if it *is* plain text. > As I said, a bad week. For you, or for me? :-) > and (addressing lists themselves) in: > > Some text. > 1. This is a list > and I continue here. > > Some text. > 1. This is a list > and I continue here. > > Some text. > 1. This is a list > and I continue here. > > Some text. > < but indented a little bit>> Or my preference: Some text. 1. This is a list item continued here. 2. This is the second item. > I don't see how we can stop people doing any of those (I bet if our > format *tries* it will be either ignored or not used "properly"). That's > one of the reasons I advocate ignoring indentation within paragraphs. > The *only* way round that would be to require blank lines in front of > list items, and that's a no-no for other reasons (well, we discussed > that last time round the Doc-SIG loop). Hm, I must've missed that. It seems reasonably enough to me (I wrote the above before reading on). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 17:53:59 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 12:53:59 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Thu, 29 Mar 2001 11:10:24 EST." References: Message-ID: <200103291753.MAA19334@cj20424-a.reston1.va.home.com> > Still, some may just not consider punctuation-style cues for markup to be > acceptable. That would be a shame - i think for situations like > docstrings and brief, day-to-day content, such limited-scope, dirt-simple > markup is the right way to go, if implemented well... I'm concerned about the attitude that docstrings are stuff you shouldn't care too much about. Python docstrings are used to document constructs in a programming language. Precision is of the essence. We *need* to be able to control every single character of the output. I think the most important requirements are to be able to indicate what is free-flowing, normal text and what isn't (either inline example text or larger blocks of literal text). Everything else is secondary. --Guido van Rossum (home page: http://www.python.org/~guido/) From klm@digicool.com Thu Mar 29 17:59:52 2001 From: klm@digicool.com (Ken Manheimer) Date: Thu, 29 Mar 2001 12:59:52 -0500 (EST) Subject: [Doc-SIG] going awry In-Reply-To: <200103291708.MAA19032@cj20424-a.reston1.va.home.com> Message-ID: On Thu, 29 Mar 2001, Guido van Rossum wrote: > Have you tried to use ST to document a language that happens to place > a special meaning on most of the ST special characters? (Like ST > itself. :-) It's horrid unless the rules are very clear and simple, > and there's a really easy way to turn ST's heuristics off -- and not > just in literal blocks (which are only half the solution). I think it makes sense to have an easy way (a really easy way:) to turn ST interpretation off for arbitrary extents - something like shell hereis, perhaps. It's also interesting to focus on using STwhatever to describe STwhatever. There's a bit of a scope question in the latter, though - would such a document be larger/more comprehensive than the kinds of things we're concerned with in docstrings? I don't know. I think with reasonable escapes it could be easy, though. (Re escapes - i'd like to see such things done keeping jim's original intent that the motivations for structured text gestures make sense in the context of the raw text as well as for their interpretation. Eg, a hereis style delimiter that looks like: ... [Unformatted passage follows, until "End of unformatted passage"] *text fragment* indicates emphasis formatting [End of unformatted passage] For want of any insight on a double-duty formalism for single-character escapes, i'd be inclined to go with '\' or character doubling...) > > There are some specific things about ST that *would* be nice to fix, and > > being free to do that (by dictatorial fiat) is a Good Thing. But I think > > throwing out the whole thing is not - it's been 5 years, dammit. > > You know, that *could* mean that the problem is simply intractable, > and that we'd all do better by admitting that the only two real > options are real plain text or real markup... Look at the history. The problems have come up in coming to agreement about a reasonably scoped effort for a lightweight language - and then in avoiding the temptation to invent a new markup language from scratch. (It's lightweight, it must be easy to formulate, right?-) Recentaly we *did* seem to actually be making progress! There was some kind of agreement about where to start, with a leg-up on a viable though crufty language, and some genuine progress towards rectifying the problems! (Thanks, thanks, thanks, edward and tony!!) I hope those efforts keep on track. (I'm not sure what documentation you have and haven't had identified - i don't have the URL for edward's STminus EBNF specification, or tibs' stpy site - i'm hoping someone will chime in with them, in case those are what you need...) Ken klm@digicool.com From dgoodger@atsautomation.com Thu Mar 29 18:07:15 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Thu, 29 Mar 2001 13:07:15 -0500 Subject: [Doc-SIG] f(...) vs. f (...) inconsistency Message-ID: > and am requesting feedback to python-docs. I think fn() looks fine, no readability problems. For copy-and-paste it's a win vis-a-vis Python's styleguide. David Goodger, Systems Administrator & Programmer Automation Tooling Systems Inc., Advanced Systems 730 Fountain Street, Building 3, Cambridge, Ontario, Canada N3H 4R7 direct: +1-519-653-4483 ext. 7121 fax: +1-519-650-6695 e-mail: dgoodger@atsautomation.com From guido@digicool.com Thu Mar 29 18:07:32 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 13:07:32 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Thu, 29 Mar 2001 11:46:24 EST." References: Message-ID: <200103291807.NAA19426@cj20424-a.reston1.va.home.com> David Goodger writes: > - A Plan for Structured Text > http://mail.python.org/pipermail/doc-sig/2000-November/001239.html > > - Problems With StructuredText > http://mail.python.org/pipermail/doc-sig/2000-November/001240.html > > - reStructuredText: Revised Structured Text Specification > http://mail.python.org/pipermail/doc-sig/2000-November/001241.html What I like most about this is that it is a full specification! The first one I've seen that's exact enough to be criticized and to be understood. I think you may be going overboard with features, but I like many of your ideas, both about heuristics for implicit markup (e.g. sections) and about the tokens you use for explicit markup (using ".."). I also like that you define the escaping mechanism upfront. (Using \ to escape means that we're going to have to make our docstrings raw strings. Big deal. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 18:22:28 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 13:22:28 -0500 Subject: [Doc-SIG] New document - pytext-fat In-Reply-To: Your message of "Wed, 28 Mar 2001 12:01:47 +0100." <001101c0b776$7ba21e60$f05aa8c0@lslp7o.int.lsl.co.uk> References: <001101c0b776$7ba21e60$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103291822.NAA19637@cj20424-a.reston1.va.home.com> I just noticed in yesterday's mail: > I'm reading this now. I take back my complaints that nobody sent a spec my way -- at least until I'm done reading. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From pf@artcom-gmbh.de Thu Mar 29 18:26:32 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Thu, 29 Mar 2001 20:26:32 +0200 (MEST) Subject: On ordered Lists (was RE: [Doc-SIG] Formalizing ST) In-Reply-To: <200103291742.MAA19253@cj20424-a.reston1.va.home.com> from Guido van Rossum at "Mar 29, 2001 12:42: 7 pm" Message-ID: Hi, Guido van Rossum: [...about writing ordered lists in plain text style...] > Or my preference: > > Some text. > > 1. This is a list item > continued here. > > 2. This is the second item. What do people (and Guido?) think of the following style for ordered lists? I've seen this used in EMail and News quite often: Some text. (1) This is a ordered list item continued here. (2) This is the second item. (3) Items may consist of several paragraphs. As long as the paragraphs keep proper indentation. This is text following the list. Of course this would require, that indentation plays an important role in an upcoming text structure grammar. The parser should always fall back into literal (aka preformatted) paragraph mode on any material, which violates the rather strict grammar rules. Only text paragraphs which are properly (equally) indented plain text and contain no hyphens at the end of lines, should be allowed for reformatting in propotional fonts and with new line breaks. I disagree in this respect with Tony and may be others in this SIG. Allowing free form paragraphs like this one here for reformatting is simply to dangerous. Mandating the rule, that normal text paragraphs must be aligned on the left side seems to be a very reasonable restriction to me. Otherwise they should come out in fixed font and people will see what they are used to before the advent of clever Text structure recognition tools. The IMO disclaimer applies here as well. It was not my intention "to throw oars into Tonys and Edwards work. Best regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From dgoodger@atsautomation.com Thu Mar 29 18:42:03 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Thu, 29 Mar 2001 13:42:03 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. Message-ID: Thanks for the encouragement, Guido! Guido van Rossum writes: > I think you may be going overboard with features I agree, and I intend to pare it down to the bare essentials. For example, descriptive lists are problematic. In the past I *have* noticed your [and others'] use of ' -- ' for em-dashes. According to the Chicago Manual of Style, you're not supposed to use spaces on either side of em-dashes, but people do use this construct and we've gotta live with it. Trying to enforce rules on people for a supposedly 'transparent' markup system like ST is ass-backwards. The markup must abide by common usage, not the other way around. That's the strongest argument against using single-quotes for inline literals I know of. We can use `backticks` or `symmetric quotes' or *both*! (I see problems with symmetric, like: "string assignment: `s = 'this is a string''". Single-quotes are just too common in all contexts, IMHO.) > I also like that you define the escaping mechanism > upfront. (Using \ to escape means that we're going to have to make > our docstrings raw strings. Big deal. :-) Anti-escape-mechanism people claim that it's not needed. They say backslashes are hard to use because of overloading (ya gotta double 'em up sometimes). But if they're not needed, why complain about how difficult it is to use them? And the only people who will actually use them (in order to document REs or ST itself) ought to know about raw strings anyhow. /DG From dgoodger@atsautomation.com Thu Mar 29 19:15:08 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Thu, 29 Mar 2001 14:15:08 -0500 Subject: On ordered Lists (was RE: [Doc-SIG] Formalizing ST) Message-ID: People will write ordered/enumerated lists in all different ways. I think that it's folly to try to come up with the "perfect" format for an enumerated list, and limit the docstring parser's recognition to that one format only. Why limit ourselves to RE-processing here? We're writing software here! Using Python! Rather, recognize a variety of formats as potential enumerated lists, and decide based on the labels. If a "potential enumerated list item" (PELI) is labeled with a "1.", and is immediately followed by a PELI labeled with a "2.", we've got an enumerated list. Or "A" followed by "B". Or "i)" followed by "ii)", etc. We wouldn't have any problem with any of these, even without requiring blank lines: "That bird wouldn't *voom* if you put 10000 volts through it!" 1 is all I need. Mr. Creosote. Whatever gave you such an idea? A murderer? No, not I. I'd never hurt a fly! The chances of a PELI labeled with "2" after "1 is all I neeed", or "II." after the last example, are acceptably small. /DG From tim.one@home.com Thu Mar 29 19:40:53 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 29 Mar 2001 14:40:53 -0500 Subject: POD (was RE: [Doc-SIG] using the same delimiter on the left and right..) In-Reply-To: <200103291648.LAA18948@cj20424-a.reston1.va.home.com> Message-ID: [Tony J Ibbs (Tibs) > Pod is used successfully in the Perl world, and is a clear winner there. > I find it intensely unreadable, as a lightweight format. [Guido] > I haven't seen too much POD, so you may be right there. Is it worse > than Latex? Well, you can include LaTeX sections in POD, so formally I guess it can't be better . Here's the POD spec: http://www.cpan.org/doc/manual/html/pod/perlpod.html It's smaller than half the msgs in this debate <0.9 wink -- but it's really not enough of "a spec" to answer all practical questions>. Do note that perlpod.html was generated from a POD doc, though. Short course: the input is broken into paragraphs (via blank-line separation). A paragraph then falls into one of three categories: verbatim, command or ordinary text. Verbatim is like a Python raw-string: *nothing* about it is altered. A paragraph is verbatim iff its first line begins with whitespace. If there isn't a space or tab in the first line, it's a command paragraph or plain text. A command paragraph begins with "=" immediately followed by an identifier. There are two commands (=head1 and =head2) for headings; three for dealing with lists (=item, =over, =back); three for embedding docs in formats other than POD (like HTML or LaTeX, or verbatim text that doesn't happen to begin with whitespace; =for, =begin, =end); and a couple for telling the Perl compiler where POD sections begin and end (=pod, =cut). That's it for commands. Everything else is ordinary text. There are 8 inline markup gimmicks, of the form "X<" text goes here ">" where X is a single character, covering italics, bold, text with non-breaking spaces, literal code, cross-reference links, filenames, index entries, and Z<> for a zero-width character. Also entity-like "&" escapes. In practice, I rarely see escapes other than C, and it's *nice* to have a wholly unambiguous way to include code snippets inline. The list gimmicks (=over, =item, =back) are visually jarring the first times you see them; in return, you never get a list by mistake; OTOH, if you want numbered lists, you supply the numbers yourself; on the fourth hand, if you want unusual list item bullets or numbering, you just type what you want. It's easy to use, and is mostly idiot-proof. OTOH, it's not *obvious* at first glance, particularly not the list stuff. But it's a matter of no more than two minutes to *learn* the list conventions, and then they're easy too. pod-is-a-lot-more-pythonic-then-perl-ly y'rs - tim From gward@mems-exchange.org Thu Mar 29 19:55:51 2001 From: gward@mems-exchange.org (Greg Ward) Date: Thu, 29 Mar 2001 14:55:51 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: <200103291648.LAA18948@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Thu, Mar 29, 2001 at 11:48:36AM -0500 References: <002401c0b830$fdd85c90$f05aa8c0@lslp7o.int.lsl.co.uk> <200103291648.LAA18948@cj20424-a.reston1.va.home.com> Message-ID: <20010329145551.A13751@mems-exchange.org> On 29 March 2001, Guido van Rossum said: > > Texinfo (and there are other more modern examples) is still "formal > > markup to produce a document", where the markup has equal status with > > the text, and is expected to intrude. People will not want to write it > > in docstrings. So we'd lose. > > But isn't this exactly what Javadoc does? I dunno about everyone else, but my objection to Javadoc is that it's not really a markup language -- it just uses HTML and throws in the @returns/@throws/etc thingy because those are useful things when documenting Java code. (And would be in Python code, too.) IOW, Javadoc is easy to turn into HTML, but (I expect) difficult to turn into anything else, unless you restrict the set of tags allowed. It sounds like there's no One True Javadoc parser, which is probably a PITA. > > Pod is used successfully in the Perl world, and is a clear winner there. > > I find it intensely unreadable, as a lightweight format. > > I haven't seen too much POD, so you may be right there. Is it worse > than Latex? Dunno who you were quoting there, but I strongly disagree with "intensely unreadable". Judge for yourself; here's a snippet of POD documentation for a C library I wrote: """ =head1 NAME bt_input - input/parsing functions in B library =head1 SYNOPSIS [...] =head1 DESCRIPTION The functions described here are used to read and parse BibTeX data, converting it from raw text to abstract-syntax trees (ASTs). =over 4 =item bt_set_stringopts () void bt_set_stringopts (bt_metatype_t metatype, ushort options); Set the string-processing options for a particular entry metatype. This affects the entry post-processing done by C, C, and C. If C is never called, the four metatypes default to the following sets of string options: BTE_REGULAR BTO_CONVERT | BTO_EXPAND | BTO_PASTE | BTO_COLLAPSE BTE_COMMENT 0 BTE_PREAMBLE 0 BTE_MACRODEF BTO_CONVERT | BTO_EXPAND | BTO_PASTE For example, bt_set_stringopts (BTE_COMMENT, BTO_COLLAPSE); will cause the library to collapse whitespace in the value from all comment entries; the AST returned by one of the C functions will reflect this change. """ "man perlpod" for the rules. The main things to know: * indentation means verbatim * C<> is code, B<> is bold, I<> is italics If this keeps up, I'll write a proposal for a POD dialect for documenting Python. The "=foo" headers would disappear for sure -- they're ugly, and that syntax is part of both Perl's parser and every POD parser. Yech. Just for fun, here's some more POD, this time from a Perl module I wrote: """ =head1 DESCRIPTION F provides a handful of otherwise unclassifiable utility routines. Don't go looking for a common thread of purpose or operation---there isn't one! =over 4 =item timestamp ([TIME]) Formats TIME in a complete, unambiguous, ready-to-sort fashion: C. TIME defaults to the current time; if it is supplied, it should be a time in the standard C/Unix representation: seconds since 1970-01-01 00:00:00 UTC, as returned by Perl's built-in C function. Returns a string containing the formatted time. =cut [...here comes a bunch of Perl code -- "=cut" means the POD is over for now and we're back to code...] =item userstamp ([USER [, HOST [, DIR]]]) Forms a useful complement to C; where C tells the "when" of an action, C gives the "who" and "where". That is, C generates and returns a string containing the current username, host, and working directory, e.g. C. Normally, no parameters are given to C---it uses C<$E> (the real uid) and C to get the username, C to get the hostname, and C to get the current directory. If you wish to generate a bogus "userstamp", though, you may do so by overriding some or all of C's arguments. For instance, to supply a fake directory, but use the defaults for USER and HOST: userstamp (undef, undef, '/fake/dir'); =cut """ That doesn't strike me as "intensely unreadable". In this crowd, I'd expect the evil nasty Perl code intruding in the docs to be a bigger problem than the markup language. ;-) Greg From pf@artcom-gmbh.de Thu Mar 29 19:54:42 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Thu, 29 Mar 2001 21:54:42 +0200 (MEST) Subject: On ordered Lists (was RE: [Doc-SIG] Formalizing ST) In-Reply-To: from "Goodger, David" at "Mar 29, 2001 2:15: 8 pm" Message-ID: Hi, Goodger, David: > People will write ordered/enumerated lists in all different ways. I think > that it's folly to try to come up with the "perfect" format for an > enumerated list, and limit the docstring parser's recognition to that one > format only. Why limit ourselves to RE-processing here? We're writing > software here! Using Python! > > Rather, recognize a variety of formats as potential enumerated lists, and > decide based on the labels. If a "potential enumerated list item" (PELI) is > labeled with a "1.", and is immediately followed by a PELI labeled with a > "2.", we've got an enumerated list. Or "A" followed by "B". Or "i)" followed > by "ii)", etc. We wouldn't have any problem with any of these, even without > requiring blank lines: This looks like a reasonable idea, indeed. I like it. However implementation is slightly complex. I would defer this to second stage, after the simpler ones have come into reality. > The chances of a PELI labeled with "2" after "1 is all I neeed", or "II." > after the last example, are acceptably small. Yes. That makes sense. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From Juergen Hermann" Message-ID: On Thu, 29 Mar 2001 14:40:53 -0500, Tim Peters wrote: >It's easy to use, and is mostly idiot-proof. OTOH, it's not *obvious* = at >first glance, particularly not the list stuff. But it's a matter of no= more >than two minutes to *learn* the list conventions, and then they're easy= too. It is. It breaks the "purity" of the plain text ideal (not a problem _fo= r me_). I 100% dislike the notion visually empty lines are not logically empty (= see the pitfalls). And finally, we'd need to extend the rules so that common lea= ding whitespace is removed before processing (so we can properly indent docst= rings). >pod-is-a-lot-more-pythonic-then-perl-ly y'rs - tim Think so too. I could live with it as a default docstring markup. Tim, w= hat do you think about a way to identify which docstring format is used for a m= odule, without resorting to some AI code figuring out the format from the text = itself? This would relieve the pressure to find a format liked by all people, an= d also open up easy upgrade paths. Ciao, J=FCrgen From guido@digicool.com Thu Mar 29 20:39:53 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 15:39:53 -0500 Subject: POD (was RE: [Doc-SIG] using the same delimiter on the left and right..) In-Reply-To: Your message of "Thu, 29 Mar 2001 14:40:53 EST." References: Message-ID: <200103292039.PAA20140@cj20424-a.reston1.va.home.com> > [Tony J Ibbs (Tibs)] > > Pod is used successfully in the Perl world, and is a clear winner there. > > I find it intensely unreadable, as a lightweight format. > > [Guido] > > I haven't seen too much POD, so you may be right there. Is it worse > > than Latex? [Tim gives a 2-minute tour of POD] So I'd like to know why Tibs finds POD so intensely unreadable -- this sounds pretty good to me, it has just the right mixture of explicit (commands and everything) and implicit (paragraphs and verbatim) markup. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 20:51:52 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 15:51:52 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Thu, 29 Mar 2001 14:55:51 EST." <20010329145551.A13751@mems-exchange.org> References: <002401c0b830$fdd85c90$f05aa8c0@lslp7o.int.lsl.co.uk> <200103291648.LAA18948@cj20424-a.reston1.va.home.com> <20010329145551.A13751@mems-exchange.org> Message-ID: <200103292051.PAA20195@cj20424-a.reston1.va.home.com> [Greg Ward] > I dunno about everyone else, but my objection to Javadoc is that it's not > really a markup language -- it just uses HTML and throws in the > @returns/@throws/etc thingy because those are useful things when documenting > Java code. (And would be in Python code, too.) IOW, Javadoc is easy to > turn into HTML, but (I expect) difficult to turn into anything else, unless > you restrict the set of tags allowed. It sounds like there's no One True > Javadoc parser, which is probably a PITA. But this sounds like it's easily formalizable. Pick a set of tags, and make sure to balk about tags you don't recognize. Having just (re)written a full HTML parser in Python, it's not that hard to parse. This has the advantage that we don't have to invent a new language -- picking a subset is much easier. HTML is verbose, but given that we're aiming for light markup anyway, it shouldn't be much of a problem. The big pitfall would be thinking that we could use a parser that doesn't understand HTML but just passes < and > through unscathed. That's probably how Javadoc started, but that's not where it's now. This is another big sin of ST (or maybe of ZWiki). > > > Pod is used successfully in the Perl world, and is a clear winner there. > > > I find it intensely unreadable, as a lightweight format. > > > > I haven't seen too much POD, so you may be right there. Is it worse > > than Latex? > > Dunno who you were quoting there, but I strongly disagree with "intensely > unreadable". It was Tibs. After Tim's post, and even more after seeing your examples, I agree with you. > Judge for yourself; here's a snippet of POD documentation for > a C library I wrote: > > """ > =head1 NAME > > bt_input - input/parsing functions in B library > > =head1 SYNOPSIS > [...] > > =head1 DESCRIPTION > > The functions described here are used to read and parse BibTeX data, > converting it from raw text to abstract-syntax trees (ASTs). > > =over 4 > > =item bt_set_stringopts () > > void bt_set_stringopts (bt_metatype_t metatype, ushort options); > > Set the string-processing options for a particular entry metatype. This > affects the entry post-processing done by C, > C, and C. If C > is never called, the four metatypes default to the following sets of > string options: > > BTE_REGULAR BTO_CONVERT | BTO_EXPAND | BTO_PASTE | BTO_COLLAPSE > BTE_COMMENT 0 > BTE_PREAMBLE 0 > BTE_MACRODEF BTO_CONVERT | BTO_EXPAND | BTO_PASTE > > For example, > > bt_set_stringopts (BTE_COMMENT, BTO_COLLAPSE); > > will cause the library to collapse whitespace in the value from all > comment entries; the AST returned by one of the C functions > will reflect this change. > """ Pretty effective, I agree. ("=over 4" is the only thing I didn't completely understand right away, although I guessed right. After reading man perlpod, there seems to be a missing "=back" though. :-) > "man perlpod" for the rules. The main things to know: > * indentation means verbatim > * C<> is code, B<> is bold, I<> is italics > > If this keeps up, I'll write a proposal for a POD dialect for documenting > Python. The "=foo" headers would disappear for sure -- they're ugly, and > that syntax is part of both Perl's parser and every POD parser. Yech. Go for it -- would be a nice counter-proposal in the KISS family! What would you use instead of =foo headers? I kind of like the This is obviously a header ========================== style myself (because that's what I've been using for years). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 21:06:58 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 16:06:58 -0500 Subject: POD (was RE: [Doc-SIG] using the same delimiter on the left and right..) In-Reply-To: Your message of "Thu, 29 Mar 2001 22:16:20 +0100." References: Message-ID: <200103292106.QAA20348@cj20424-a.reston1.va.home.com> > On Thu, 29 Mar 2001 14:40:53 -0500, Tim Peters wrote: > > >It's easy to use, and is mostly idiot-proof. OTOH, it's not *obvious* at > >first glance, particularly not the list stuff. But it's a matter of no more > >than two minutes to *learn* the list conventions, and then they're easy too. > > It is. It breaks the "purity" of the plain text ideal (not a problem > _for me_). Nor for me. > I 100% dislike the notion visually empty lines are not > logically empty (see the pitfalls). There's no need to make the same mistakes. > And finally, we'd need to extend the rules so that common leading > whitespace is removed before processing (so we can properly indent > docstrings). Yes -- that's always a given for docstrings. (I hate reading unindented docstrings!) > >pod-is-a-lot-more-pythonic-then-perl-ly y'rs - tim > > Think so too. I could live with it as a default docstring > markup. Tim, what do you think about a way to identify which > docstring format is used for a module, without resorting to some AI > code figuring out the format from the text itself? > > This would relieve the pressure to find a format liked by all > people, and also open up easy upgrade paths. A special marker anywhere in the first or second docstring of a module (at the end of the first, assuming that any tool has to be able to find the end of the docstring anyway; or at the beginning of the second, which hides it from more primitive historic docstring processors); or if that's too complicated, a funny-looking comment would do fine. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 21:17:18 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 16:17:18 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Thu, 29 Mar 2001 13:42:03 EST." References: Message-ID: <200103292117.QAA20418@cj20424-a.reston1.va.home.com> > I agree, and I intend to pare it down to the bare essentials. For example, > descriptive lists are problematic. In the past I *have* noticed your [and > others'] use of ' -- ' for em-dashes. According to the Chicago Manual of > Style, you're not supposed to use spaces on either side of em-dashes, but > people do use this construct and we've gotta live with it. I own the CMoS, but almost never read it. I've learned my English and writing style mostly by assimilation -- this is what I see people do, so this is what I do. So yes, we've got to live with it. :-) > Trying to enforce rules on people for a supposedly 'transparent' markup > system like ST is ass-backwards. The markup must abide by common usage, not > the other way around. That's the strongest argument against using > single-quotes for inline literals I know of. We can use `backticks` or > `symmetric quotes' or *both*! (I see problems with symmetric, like: > "string assignment: `s = 'this is a string''". Single-quotes are just too > common in all contexts, IMHO.) I would suggest that if you can't find a good existing idiom, and you have to invent something, it's better to invent something that's drastically different (like POD's C<...> and =foo) rather than something that uses a character so small and easily mistaken as backtick. In many fonts, backtick is hard to distinguish from apostrophe! > > I also like that you define the escaping mechanism > > upfront. (Using \ to escape means that we're going to have to make > > our docstrings raw strings. Big deal. :-) > > Anti-escape-mechanism people claim that it's not needed. I think the ST experience shows what BS this is. > They say > backslashes are hard to use because of overloading (ya gotta double 'em up > sometimes). But if they're not needed, why complain about how difficult it > is to use them? And the only people who will actually use them (in order to > document REs or ST itself) ought to know about raw strings anyhow. 'nuff said. Backslash it is. (MoinMoin shows the insanity of doubling to quote.) --Guido van Rossum (home page: http://www.python.org/~guido/) From artcom0!pf@artcom-gmbh.de Thu Mar 29 10:33:14 2001 From: artcom0!pf@artcom-gmbh.de (artcom0!pf@artcom-gmbh.de) Date: Thu, 29 Mar 2001 12:33:14 +0200 (MEST) Subject: [Doc-SIG] Formalizing ST In-Reply-To: <002301c0b830$fcafe220$f05aa8c0@lslp7o.int.lsl.co.uk> from "Tony J Ibbs (Tibs)" at "Mar 29, 2001 10:16:50 am" Message-ID: Hi, Tony J Ibbs (Tibs) schrieb: > You haven't been following the me and Edward Loper (and Edward > Welbourne) flurry of emails over recent weeks, have you? Yes, I did, but refused to jump in: I believe it was somewhat theoretic. In practice I have never ever seen a URL ending with a period. Please give real world evidence of some useful URL. IMO it wouldn't hurt, if detection fails in this case. I don't suggest to forbid the '<' and '>' delimiters. Just make them optional. This will work just fine in at least 99.8 % of all cases. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From guido@digicool.com Thu Mar 29 22:09:40 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 17:09:40 -0500 Subject: [Doc-SIG] New document - pytext-fat In-Reply-To: Your message of "Thu, 29 Mar 2001 13:22:28 EST." <200103291822.NAA19637@cj20424-a.reston1.va.home.com> References: <001101c0b776$7ba21e60$f05aa8c0@lslp7o.int.lsl.co.uk> <200103291822.NAA19637@cj20424-a.reston1.va.home.com> Message-ID: <200103292209.RAA20521@cj20424-a.reston1.va.home.com> > > Lots of good stuff here. Comments (I wish I had the ST source, but I don't, so I won't bother quoting from it): - I think that the references to DOM trees are unnecessarily implementation details -- even though I like the idea of a formalized tree representation. (I happen to think DOM is overrated, but I won't object against its use -- I do object against mentioning it in the spec.) - Call me oldfashioned, but I like having an extra space between sentences. Emacs text mode has some very good heuristics for this. - Why is it bad to insist on whitespace between list items? That would make the block rules simpler and cause significantly fewer situations where a list item is mistakenly started. At the very least I'd insist on a blank line before the first item and after the last item of a list. - The indentation rules are essentially those used for Python source. I'm glad you say outright that you won't use them in the way ST uses them -- as you may know by now, I think ST's use of indentation level to derive heading levels is painful. For nested lists, it's fine of course, and also for creating the occasional indented text paragraph a la . - Using --- for descriptive list is no better than --. (Note that there's a typo in the example -- the second example uses '--' instead of '---'. If you *have* to have descriptive lists, try doing something creative with input that already looks the way you propose that descriptive lists be rendered. I think that maybe, as long as you are measuring indentation anyway, you *should* consider the indentation of subsequent lines, requiring all lines of a text paragraph to be indented the same, and marking the start of a new block on a change in indent. After all, if we want the source to be readable, we can't tolerate a ragged left margin (except in literal blocks). Sure, there are people who indent the first line of their paragraphs. There are also typesetting conventions that *dedent* the first line of each paragraph. But for plain text, I've found both conventions ugly and distracting, and I wouldn't mind ruling these out so we can use indentation changes for other purposes. - Requiring the spaces around the delimiter doesn't help (it's too subtle). - An alternative could be to use -- or --- but require some kind of explicit markup to start and end a descriptive list. - I'm glad that you don't auto-renumber ordered lists. - I'm not sure that there's a point to allowing disjoint text for ordered (or any kind of) list. Again, if we believe that the source should read as well as plain text, we should require that it is formatted neatly. The disjoint text example you give seems to come straight from the LaTeX (or similar) manual where it explains that whitespace details in the input are ignored. But we *shouldn't* ignore any whitespace details in the input, since that's our main clue! - Having to work around auto-detection of numbered lists is my #1 ST pet peeve. I know that part of that's a ST bug -- but I still believe ordered lists are not sufficiently important to warrant the pain they occasionally cause. The Emacs text mode I use automatically detects numbered lists and it is *never* what I want. At the very least you should require that the rest of the input is neatly formatted the way one would format an ordered list in a plain text document. - The paragraph about intermingling is ambiguous. Is it natural to have a list with some ordered items, some items using *, and some items using -? I think not. If you meant nesting, of course you're right -- but please say so, and give an example. - Do we really need more than two levels of headings? I kind of doubt it. Alternatively, we could allow numbered headings (of course the numbers have to be supplied by the author) and derive the level from the structure of the number. (Q: are unnumbered headings at higher or lower levels than numbered headings? I dunno!) - About dedented paragraphs after indented sections: you can't really express in regular text that a plain paragraph is not part of the previous section unless you insert a heading. Maybe a better alternative (again using the rule that we should never ignore the whitespace clues in the source!) would be to simply indent indented headings and and paragraphs, a la . - I like the idea of anchor blocks -- they seem to be like References in scientific papers. But why do they have to start with two dots? And how much semantics (as opposed to formatting) do they need? - Labels: I'm not sure I get the point. What is this for? The "explanation" doesn't explain it for me. I think this is digressing too far from the "plain text as documentation" idea. - The concept of children seems wrong for literal blocks. I agree with the rule that a literal block starts after a paragraph ending in "::" and ends at the first line that's indented the same or less as that "::" paragraph; but I would propose that conceptually, the entire literal block is a child of the previous paragraph. - The example with a paragraph consisting of *just* "::" should render that as a single colon, to be consistent. If you think this should be special-cased, you need to explain why -- the argument "(a) it's not worth preventing" doesn't really hold when you special-case it anyway! - You can collapse most of the description of doctest blocks with that of literal blocks -- they are really just a different way of *recognizing* a literal block (the >>> start), they are not to be treated differently (except by doctest). Note that we may not need to recognize doctest blocks separately -- doctest is perfectly happy with indented doctest blocks. - In-line literals: I don't like the use of '...' for literals. It's too unintuitive (unless you leave the quotes in the output!). - The section on Python literals is missing someting -- what is a Python literal? From the example I have to guess that it's something between hash marks. It's too ugly IMO. - URL recognition: you know my position. :-) Hope this helps, --Guido van Rossum (home page: http://www.python.org/~guido/) From klm@digicool.com Thu Mar 29 22:08:30 2001 From: klm@digicool.com (klm@digicool.com) Date: Thu, 29 Mar 2001 17:08:30 -0500 (EST) Subject: [Doc-SIG] using the same delimiter on the left and right.. Message-ID: <15043.45662.292734.953854@serenade.digicool.com> David Goodger wrote: > Trying to enforce rules on people for a supposedly 'transparent' markup > system like ST is ass-backwards. The markup must abide by common usage, not > the other way around. That's the strongest argument against using > single-quotes for inline literals I know of. We can use `backticks` or > `symmetric quotes' or *both*! (I see problems with symmetric, like: > "string assignment: `s = 'this is a string''". Single-quotes are just too > common in all contexts, IMHO.) Obviously, many of us don't think it's ass backwards. Structured text uses conventions that take their markup significance from their common usage. Many of us find this to be a big benefit in practice. A common use for single quotes is to signify literals - they're used for contractions and other things, but by paying attention to context there's no conflict. I would note that you use structured text conventions, like *asterisks* for emphasis, in the very text where you dismiss structured text's approach. Ken Manheimer klm@digicool.com From klm@digicool.com Thu Mar 29 22:08:33 2001 From: klm@digicool.com (klm@digicool.com) Date: Thu, 29 Mar 2001 17:08:33 -0500 (EST) Subject: [Doc-SIG] Re: POD Message-ID: <15043.45665.657583.867973@serenade.digicool.com> [Guido] > > [Tony J Ibbs (Tibs)] > > > Pod is used successfully in the Perl world, and is a clear winner there. > > > I find it intensely unreadable, as a lightweight format. > > > > [Guido] > > > I haven't seen too much POD, so you may be right there. Is it worse > > > than Latex? > > [Tim gives a 2-minute tour of POD] > > So I'd like to know why Tibs finds POD so intensely unreadable -- this > sounds pretty good to me, it has just the right mixture of explicit > (commands and everything) and implicit (paragraphs and verbatim) > markup. It's the difference between reading markup and plain text. Programmers are going to have to read the source docstrings - the raw text. Structured text offers something very much like plain text, while POD offers something very much like, well, typical markup. Which would you rather read? I know that those occasions where i've saved a web document to a file for later reading, the "save as text" version was infinitely more readable than the html version. Have you really pictured what python modules are going to look like with """=head whatnot =over 4 ...""" in it??? Maybe such stuff looks clean in the midst of perl code, but such docstrings will be more cluttered and less readable than the code, itself! Why the heck would you want to do that? Have mercy on the programmer who wants to read docstrings. Ken klm@digicool.com From klm@digicool.com Thu Mar 29 22:16:34 2001 From: klm@digicool.com (klm@digicool.com) Date: Thu, 29 Mar 2001 17:16:34 -0500 (EST) Subject: [Doc-SIG] Re: POD Message-ID: <15043.46146.872016.960691@serenade.digicool.com> I wrote: > in it??? Maybe such stuff looks clean in the midst of perl code, but > such docstrings will be more cluttered and less readable than the [python] > code, itself! Why the heck would you want to do that? Have mercy on > the programmer who wants to read docstrings. I guess, if there's multiple proposals, we can have a kind of bake-off where we get to look at examples of the different formats. At least there the relative lack of clutter in ST will be clear. That won't settle fears of being hemmed in or tripped up by the rules, that would have to be addressed with acceptable and formal rules, clean implementation, etc. But from what i've seen of the alternatives, i'd be really sad to have to endure docstrings with noisy garbage in them in order to have some structure expressed. Yuck. Ken klm@digicool.com From guido@digicool.com Thu Mar 29 22:26:37 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 17:26:37 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Thu, 29 Mar 2001 17:08:30 EST." <15043.45662.292734.953854@serenade.digicool.com> References: <15043.45662.292734.953854@serenade.digicool.com> Message-ID: <200103292226.RAA20709@cj20424-a.reston1.va.home.com> > > Trying to enforce rules on people for a supposedly 'transparent' markup > > system like ST is ass-backwards. The markup must abide by common usage, not > > the other way around. That's the strongest argument against using > > single-quotes for inline literals I know of. We can use `backticks` or > > `symmetric quotes' or *both*! (I see problems with symmetric, like: > > "string assignment: `s = 'this is a string''". Single-quotes are just too > > common in all contexts, IMHO.) > > Obviously, many of us don't think it's ass backwards. Structured text > uses conventions that take their markup significance from their > common usage. Many of us find this to be a big benefit in practice. > A common use for single quotes is to signify literals - they're used > for contractions and other things, but by paying attention to context > there's no conflict. > > I would note that you use structured text conventions, like > *asterisks* for emphasis, in the very text where you dismiss > structured text's approach. That's because when I type something in single quotes, I usually want the quotes to appear in the rendered text. If I write "In C, 'c' is a character literal but 'cc' is undefined", it becomes unreadable when the output becomes "In C, c is a character literal but cc is undefined" -- even if the c and cc are in a different font, it's still *wrong*. This problem doesn't happen with *emph*. --Guido van Rossum (home page: http://www.python.org/~guido/) From klm@digicool.com Thu Mar 29 22:36:06 2001 From: klm@digicool.com (Ken Manheimer) Date: Thu, 29 Mar 2001 17:36:06 -0500 (EST) Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: <200103292226.RAA20709@cj20424-a.reston1.va.home.com> Message-ID: On Thu, 29 Mar 2001, Guido van Rossum wrote: > That's because when I type something in single quotes, I usually want > the quotes to appear in the rendered text. If I write "In C, 'c' is a > character literal but 'cc' is undefined", it becomes unreadable when > the output becomes "In C, c is a character literal but cc is > undefined" -- even if the c and cc are in a different font, it's still > *wrong*. This problem doesn't happen with *emph*. Actually, the example you picked strikes me as a rather extraordinary case, where you're talking about the literal code: 'c' and 'cc' I agree that we would want to provide for expressing such stuff, and need some escapes for them, but *writing about the punctunation* doesn't strike me as a common case. That said, i also agree that there are other times in regular text where i want the "'" apostrophes to show through. Again, escapes could provide for that. Ken klm@digicool.com From guido@digicool.com Thu Mar 29 22:39:23 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 17:39:23 -0500 Subject: [Doc-SIG] Re: POD In-Reply-To: Your message of "Thu, 29 Mar 2001 17:08:33 EST." <15043.45665.657583.867973@serenade.digicool.com> References: <15043.45665.657583.867973@serenade.digicool.com> Message-ID: <200103292239.RAA20769@cj20424-a.reston1.va.home.com> > > So I'd like to know why Tibs finds POD so intensely unreadable -- this > > sounds pretty good to me, it has just the right mixture of explicit > > (commands and everything) and implicit (paragraphs and verbatim) > > markup. > > It's the difference between reading markup and plain text. > Programmers are going to have to read the source docstrings - the raw > text. Structured text offers something very much like plain text, > while POD offers something very much like, well, typical markup. > Which would you rather read? If the amount of markup is as light as suggested by Greg's examples, I have no problem reading POD. If you ask "which would you rather *write*", I strongly prefer a simple predictable set of rules over the heuristics employed by ST-like systems. *Maybe* I can live with a set of exact ST-like rules that are designed to be exactly specified, easily remembered, and not to cause surprises in common markup. ST classic and STng have way too many interacting heuristics, and are hence neither exactly specified, not easily remembered (there always seems to be *yet* another special case), and as a consequence create way too many surprises. > I know that those occasions where i've saved a web document to a file > for later reading, the "save as text" version was infinitely more > readable than the html version. Hm. You must be unusual in this respect. I just save the HTML and read it with a web browser later. Who would want to read the raw HTML? > Have you really pictured what python > modules are going to look like with > > """=head whatnot > > =over 4 > > ...""" > > in it??? Maybe such stuff looks clean in the midst of perl code, but > such docstrings will be more cluttered and less readable than the > code, itself! Why the heck would you want to do that? Have mercy on > the programmer who wants to read docstrings. Have you *tried* reading POD? I just sampled some Perl files lying around on my system in /usr/lib/perl/. You should do the same. The docs in all the Perl sources that I've sampled so far are remarkably readable -- and they don't even require a double colon in front of literal blocks :-). The secret seems to be that their markup is so simple that you very quickly learn to recognize it, and it's pretty minimal. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Mar 29 22:51:14 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 17:51:14 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Thu, 29 Mar 2001 17:36:06 EST." References: Message-ID: <200103292251.RAA20868@cj20424-a.reston1.va.home.com> > Actually, the example you picked strikes me as a rather extraordinary > case, where you're talking about the literal code: > > 'c' > > and > > 'cc' > > I agree that we would want to provide for expressing such stuff, and need > some escapes for them, but *writing about the punctunation* doesn't strike > me as a common case. It does seem common to me. This has tripped me up several times in ZWikis already, and I know that I write *about* literals a lot. Also note that plenty of folks (including me, sometimes) use 'single quotes' to quote phrases. Making that a constant-width font is ugly and wrong; forcing them not to do that is also going against the purported goal of ST (which seems to be that we can continue to write text like we're used to, and it will 'automatically' be formatted right). I am arguing against specific cases of automatic formatting that seem to be based in less-than-universal conventions. (Note that I've never complained about *emph* -- it works well -- but 'literal' just doesn't.) > That said, i also agree that there are other times in regular text where i > want the "'" apostrophes to show through. Again, escapes could provide > for that. I'd rather use a more explicit way to indicate literals. POD's choice is not so bad here, even if you don't like =head1. --Guido van Rossum (home page: http://www.python.org/~guido/) From dgoodger@atsautomation.com Thu Mar 29 22:52:58 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Thu, 29 Mar 2001 17:52:58 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. Message-ID: Ken Manheimer wrote: > David Goodger wrote: > > > Trying to enforce rules on people for a supposedly > 'transparent' markup > > system like ST is ass-backwards. The markup must abide by > common usage, not > > the other way around. That's the strongest argument against using > > single-quotes for inline literals I know of. We can use > `backticks` or > > `symmetric quotes' or *both*! (I see problems with > symmetric, like: > > "string assignment: `s = 'this is a string''". > Single-quotes are just too > > common in all contexts, IMHO.) > > Obviously, many of us don't think it's ass backwards. Structured text > uses conventions that take their markup significance from their > common usage. Many of us find this to be a big benefit in practice. > A common use for single quotes is to signify literals - they're used > for contractions and other things, but by paying attention to context > there's no conflict. > > I would note that you use structured text conventions, like > *asterisks* for emphasis, in the very text where you dismiss > structured text's approach. Bad choice of words; I didn't mean to dismiss ST at all. I'm a strong proponent; I love the concepts of ST, and Setext before it. It just feels *right* (especially in comparison to POD or JavaDoc); but I think it's got some warts (who doesn't? :-) and could use improvement (who can't? :-). It's sometimes hard to be objective, to see the warts, if they're too close, especially if you're used to them and love them dearly. Nobody's attacking Zope's ST; at least, *I'm* not. It works for Zope; long live Zope & ST! I apologize for any perceived attack. My goal is simple: I want to write easily extractable docstrings in my Python programs, and quick & dirty web pages and docs in general, convertable to something useful. Because of its warts, ST (Classic or Next Generation) hasn't been accepted for this purpose by myself or the Python community. Specifically, I was referring to the conflicting use of ' -- ' in ST for definition/descriptive lists versus its common use in texts as em-dashes. Somebody mentioned forcing people to use it for DL's only. If it's in common use for one thing, I can't see using it for another. I think that single-quotes for literals is another conflict, a wart that, if removed, may bring us one step closer to my stated goal. Adding an escape mechanism is a must, but you may see it as adding a wart. One man's wart is another man's beauty mark. /DG From klm@digicool.com Thu Mar 29 22:54:00 2001 From: klm@digicool.com (Ken Manheimer) Date: Thu, 29 Mar 2001 17:54:00 -0500 (EST) Subject: [Doc-SIG] Re: POD In-Reply-To: <200103292239.RAA20769@cj20424-a.reston1.va.home.com> Message-ID: On Thu, 29 Mar 2001, Guido van Rossum wrote: > If the amount of markup is as light as suggested by Greg's examples, I > have no problem reading POD. If you ask "which would you rather > *write*", I strongly prefer a simple predictable set of rules over the > heuristics employed by ST-like systems. *Maybe* I can live with a set > of exact ST-like rules that are designed to be exactly specified, > easily remembered, and not to cause surprises in common markup. ST > classic and STng have way too many interacting heuristics, and are > hence neither exactly specified, not easily remembered (there always > seems to be *yet* another special case), and as a consequence create > way too many surprises. That makes sense to me. >From what i've been hearing, tony and edward have been heading in the direction of minimal, and i'd be interested (and hopeful) that that would take care of those problems. > > I know that those occasions where i've saved a web document to a file > > for later reading, the "save as text" version was infinitely more > > readable than the html version. > > Hm. You must be unusual in this respect. I just save the HTML and > read it with a web browser later. Who would want to read the raw > HTML? Oh, thinking about it, i realized it was occasions where i wanted to take printed copies with me, and didn't have browser printing working, but did have lp-style printing working. At that point i *did* compare the html vs plain text - even for documents where the meat was mostly paragraphs with occasional em and strong style emphasis, there was not contest. > Have you *tried* reading POD? I just sampled some Perl files lying > around on my system in /usr/lib/perl/. You should do the same. The > docs in all the Perl sources that I've sampled so far are remarkably > readable -- and they don't even require a double colon in front of > literal blocks :-). The secret seems to be that their markup is so > simple that you very quickly learn to recognize it, and it's pretty > minimal. I just took a look - the first thing i found with substantial pod was Apache.pm, and it does look pretty bad to me. Perhaps it's an extreme case, but one thing i notice in general is that it's used very differently than python function/method/class docstrings - it seems more to be situated as we situate the module docstring, and i wonder whether the approach would be at all suited, layout wise, to embedded docstrings. Anyway, for grins, here's an excerpt of the passage that lives up to my fears - the first thing i found. From klm@digicool.com Thu Mar 29 23:03:12 2001 From: klm@digicool.com (Ken Manheimer) Date: Thu, 29 Mar 2001 18:03:12 -0500 (EST) Subject: [Doc-SIG] Re: POD (resend) Message-ID: Whoops - somehow that last message got sent before i was done, as a result of switching desktops. I wanted to point out that while i'm not crazy about the processed, plain-text pod output (*.pod), i really dislike the stuff in the source. Take a look in particular at the last paragraph in my excerpt - such markup may be a necessary evil, but i'm really hoping we don't have to put something of the sort in function docstrings. Ken 8<---------------------------------------------------------------->8 =head1 NAME Apache - Perl interface to the Apache server API =head1 SYNOPSIS use Apache (); =head1 DESCRIPTION This module provides a Perl interface the Apache API. It is here mainly for B, but may be used for other Apache modules that wish to embed a Perl interpreter. We suggest that you also consult the description of the Apache C API at http://www.apache.org/docs/. =head1 THE REQUEST OBJECT The request object holds all the information that the server needs to service a request. Apache Bs will be given a reference to the request object as parameter and may choose update or use it in various ways. Most of the methods described below obtain information from or updates the request object. The perl version of the request object will be blessed into the B package, it is really a C in disguise. =over 4 =item Apache->request([$r]) The Apache->request method will return a reference to the request object. Bs can obtain a reference to the request object when it is passed to them via C<@_>. However, scripts that run under B, for example, need a way to access the request object. B will make a request object available to these scripts by passing an object reference to Crequest($r)>. If handlers use modules such as B that need to access Crequest>, they too should do this (e.g. B). From guido@digicool.com Thu Mar 29 23:13:24 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 18:13:24 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Thu, 29 Mar 2001 17:52:58 EST." References: Message-ID: <200103292313.SAA20958@cj20424-a.reston1.va.home.com> > Nobody's attacking Zope's ST; at least, *I'm* not. It works for Zope; long > live Zope & ST! I apologize for any perceived attack. I think I know why Ken's a little apprehensive. I've repeatedly and loudly complained about ST on internal lists; I've repeated most of my arguments why it is broken here. While they may not hold against ST-like systems, the ST implementation currently in use for ZWiki really does leave a lot to be desired, IMO, and I am sorry for the poor Zope users who have to use it. > My goal is simple: I want to write easily extractable docstrings in my > Python programs, and quick & dirty web pages and docs in general, > convertable to something useful. Because of its warts, ST (Classic or Next > Generation) hasn't been accepted for this purpose by myself or the Python > community. > > Specifically, I was referring to the conflicting use of ' -- ' in ST for > definition/descriptive lists versus its common use in texts as em-dashes. > Somebody mentioned forcing people to use it for DL's only. If it's in common > use for one thing, I can't see using it for another. I think that > single-quotes for literals is another conflict, a wart that, if removed, may > bring us one step closer to my stated goal. Adding an escape mechanism is a > must, but you may see it as adding a wart. One man's wart is another man's > beauty mark. This may be moving in the right direction: identify specific warts in ST and fix them, without losing the general vision. (But compatibility with existing implementations is *not* a requirement, as it would force us to keep too many bad design decisions.) --Guido van Rossum (home page: http://www.python.org/~guido/) From klm@digicool.com Thu Mar 29 23:18:32 2001 From: klm@digicool.com (Ken Manheimer) Date: Thu, 29 Mar 2001 18:18:32 -0500 (EST) Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Message-ID: On Thu, 29 Mar 2001, Goodger, David wrote: > some warts (who doesn't? :-) and could use improvement (who can't? :-). It's > sometimes hard to be objective, to see the warts, if they're too close, > especially if you're used to them and love them dearly. I agree. In fact, i'm very frustrated with the ST implementation on zope.org - classic - and haven't had time to play with STNG, so can't be sure that it solves all my complaints. And i understand better guidos complaints - there are some compromises in ST as currently formulated that seriously get in the way of expressing what you mean to express, which is unacceptable. My hope is that it's fixable! As i've been saying, my sense is that tony and and edward seem to me to be asking the right questions and trying to make the right choices. Maybe this discussion will help inform that effort - i'm just trying to see that it doesn't simply dissuade them from trying. And i think the committee process, once it gets a certain momentum, can make it too easy to dismiss an approach that could be key to developing a good solution. > Specifically, I was referring to the conflicting use of ' -- ' in ST for > definition/descriptive lists versus its common use in texts as em-dashes. > Somebody mentioned forcing people to use it for DL's only. If it's in common > use for one thing, I can't see using it for another. I think that > single-quotes for literals is another conflict, a wart that, if removed, may > bring us one step closer to my stated goal. Adding an escape mechanism is a > must, but you may see it as adding a wart. One man's wart is another man's > beauty mark. Actually, i think that's stretching the metaphor a bit!-) :-) But i do agree - there are some hard trade-offs that need to be made here, to come up with something that does not interfere with plain-text expressiveness, while also determining markup in an uncluttered, unambiguous way. I'm afraid i don't have much more to offer than that... Ken klm@digicool.com From klm@digicool.com Thu Mar 29 23:19:37 2001 From: klm@digicool.com (Ken Manheimer) Date: Thu, 29 Mar 2001 18:19:37 -0500 (EST) Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: <200103292313.SAA20958@cj20424-a.reston1.va.home.com> Message-ID: On Thu, 29 Mar 2001, Guido van Rossum wrote: > This may be moving in the right direction: identify specific warts in > ST and fix them, without losing the general vision. (But > compatibility with existing implementations is *not* a requirement, as > it would force us to keep too many bad design decisions.) Sounds good to me. Ken klm@digicool.com From tim.one@home.com Thu Mar 29 23:50:01 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 29 Mar 2001 18:50:01 -0500 Subject: [Doc-SIG] Re: POD (resend) In-Reply-To: Message-ID: [Ken Manheimer] > ... > i really dislike the stuff in the source. Take a look in particular > at the last paragraph in my excerpt - such markup may be a necessary > evil, but i'm really hoping we don't have to put something of the > sort in function docstrings. I figure the last paragraph is here: > Bs can obtain a reference to the request object when it > is passed to them via C<@_>. However, scripts that run under > B, for example, need a way to access the > request object. B will make a request object > available to these scripts by passing an object reference to > Crequest($r)>. If handlers use modules such as > B that need to access Crequest>, they > too should do this (e.g. B). Without the markup, ------ Perl*Handlers can obtain a reference to the request object when it is passed to them via @_. However, scripts that run under Apache::Registry, for example, need a way to access the request object. Apache::Registry will make a request object available to these scripts by passing an object reference to Apache->request($r). If handlers use modules such as CGI::Apache that need to access Apache->request, they too should do this (e.g. Apache::Status). ------ So the first lesson I take from that is that the doc author was markup-happy: I can read it just fine without *any* markup (and POD is such that it won't muck with anything I wrote there, apart from reflowing the text -- I don't even have to *think* about running afoul of its heuristics, because there aren't any ). The worst ugliness is specific to Perl: object->attribute is a common code construct in Perl, so if you feel compelled to wrap all examples in C<> or B<> markup, you have to get ugly to escape the embedded ">" via "E" instead. That wouldn't come up often in Python docstrings. From tim.one@home.com Fri Mar 30 00:14:45 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 29 Mar 2001 19:14:45 -0500 Subject: [Doc-SIG] Re: POD In-Reply-To: <200103292239.RAA20769@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > If the amount of markup is as light as suggested by Greg's examples, I > have no problem reading POD. Greg's markup was above the norm, in my experience. Tom Christiansen writes POD docs with minimal markup, like (from tchrist's Net::hostent): ------ =head1 DESCRIPTION This module's default exports override the core gethostbyname() and gethostbyaddr() functions, replacing them with versions that return "Net::hostent" objects. This object has methods that return the similarly named structure field name from the C's hostent structure from F; namely name, aliases, addrtype, length, and addr_list. The aliases and addr_list methods return array reference, the rest scalars. The addr method is equivalent to the zeroth element in the addr_list array reference. You may also import all the structure fields directly into your namespace as regular variables using the :FIELDS import tag. (Note that this still overrides your core functions.) Access these fields as variables named with a preceding C. Thus, C<$host_obj-Ename()> corresponds to $h_name if you import the fields. Array references are available as regular array variables, so for example C<@{ $host_obj-Ealiases() }> would be simply @h_aliases. ------ So he doesn't bother wrapping function and module names in markup *except* in actual code examples, leaving it to the reader to figure out that a trailing "()" means he's talking about a function and that embedded "::" means a Perl namespace gimmick, etc. I really don't need a different font to tell me that! But markup is essential to distinguish, e.g., the English word "use" from the Perl statement C -- and this very paragraph is suitable input to POD for showing the difference s>. > ... > The secret seems to be that their markup is so simple that you very > quickly learn to recognize it, and it's pretty minimal. Bingo! It's not *obvious* at first glance, but the effort required to learn it is truly minimal. However, as with many things Perlish, the effort of figuring out what it actually *does* in all cases is the basis for a lifetime of adventure <0.9 wink> ... From guido@digicool.com Fri Mar 30 00:18:30 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 29 Mar 2001 19:18:30 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Thu, 29 Mar 2001 18:18:32 EST." References: Message-ID: <200103300018.TAA21273@cj20424-a.reston1.va.home.com> > I agree. In fact, i'm very frustrated with the ST implementation on > zope.org - classic - and haven't had time to play with STNG, so can't be > sure that it solves all my complaints. I just spent an hour digging around in the STNG wiki: http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage Now I'm unfortunately *sure* that it *doesn't* address the problems discussed here. STNG is a new software architecture that makes it easier to customize and modify the ST implementation, but the ST wiki is almost totally devoid from plans to fix the problems with the specific rules. --Guido van Rossum (home page: http://www.python.org/~guido/) From gward@mems-exchange.org Fri Mar 30 03:32:50 2001 From: gward@mems-exchange.org (Greg Ward) Date: Thu, 29 Mar 2001 22:32:50 -0500 Subject: [Doc-SIG] Re: POD In-Reply-To: <15043.45665.657583.867973@serenade.digicool.com>; from klm@digicool.com on Thu, Mar 29, 2001 at 05:08:33PM -0500 References: <15043.45665.657583.867973@serenade.digicool.com> Message-ID: <20010329223250.F13751@mems-exchange.org> On 29 March 2001, klm@digicool.com said: > """=head whatnot > > =over 4 > > ...""" > > in it??? Maybe such stuff looks clean in the midst of perl code, but > such docstrings will be more cluttered and less readable than the > code, itself! Why the heck would you want to do that? Have mercy on > the programmer who wants to read docstrings. The reason I like POD is because of the low-level, word/sentence level of markup: compare if you B want to hose yourself, assign C<37> to C after system startup (but I say I didn't warn you!) to if you really want to hose yourself, assign 37 to foo after system startup (but don't say I didn't warn you!) It's a simple, subtle change, but it really makes a difference in both readability and writeability. I know that ST is even *more* readable and writeable, but I too worry about the ambiguity of overloading apostrophes and asterisks. The inter-paragraph stuff -- "=head", "=over", and so forth -- I can do without. And most of it would be irrelevant in Python docstrings, since they come with context "for free". I suspect that the only time you really need headings (etc.) in docstrings is when you're trying to write a complete module man page in docstrings. I've written something like 150 pages of documentation for various Perl and C libraries using POD. I've also written lots of LaTeX and HTML text, and once (long ago) I even wrote a man page with troff. Of those, only POD and HTML are even comprehensible to a mere mortal such as myself, and POD is far, far easier to type. I don't recall ever getting seriously hung-up on markup issues, except for the buggy and ill-defined L<> (hyperlink) tag. The syntax and semantics -- except for L<>! -- are just too simple. It's *not* a general-purpose markup language, and it does its job just fine all the same, thank-you-very-much. Greg From gward@mems-exchange.org Fri Mar 30 03:40:38 2001 From: gward@mems-exchange.org (Greg Ward) Date: Thu, 29 Mar 2001 22:40:38 -0500 Subject: [Doc-SIG] Re: POD (resend) In-Reply-To: ; from tim.one@home.com on Thu, Mar 29, 2001 at 06:50:01PM -0500 References: Message-ID: <20010329224038.G13751@mems-exchange.org> On 29 March 2001, Tim Peters said: > So the first lesson I take from that is that the doc author was markup-happy: > I can read it just fine without *any* markup (and POD is such that it won't > muck with anything I wrote there, apart from reflowing the text -- I don't > even have to *think* about running afoul of its heuristics, because there > aren't any ). Actually, that might not be true: the last I remember of the pod-people mailing list (before I unsubscribed in frustration -- these holy wars are not confined to Python-land!), certain parties were arguing quite vociferously in favour of all manner of clever heuristics. The idea was that POD processors (or rather, the Pod::Parser module that I believe made it into the standard library in Perl 5.6) would inspect the module whose documentation was being parsed to figure out what words ought to be C or B or what-have-you. However, I didn't stick around to see how the holy wars ended -- sanity might have won out. Needless to say, such heuristics ought to be deemed unPythonic and stamped out in any mythical POD-for-Python implementation. ;-) Greg From tim.one@home.com Fri Mar 30 04:35:46 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 29 Mar 2001 23:35:46 -0500 Subject: [Doc-SIG] Re: POD (resend) In-Reply-To: <20010329224038.G13751@mems-exchange.org> Message-ID: [Greg Ward] > ... > the last I remember of the pod-people Great name! > mailing list (before I unsubscribed in frustration -- these holy wars > are not confined to Python-land!), Perhaps not. But ours are so much ... holier than theirs . > certain parties were arguing quite vociferously in favour of all > manner of clever heuristics. The idea was that POD processors (or > rather, the Pod::Parser module that I believe made it into the > standard library in Perl 5.6) would inspect the module whose > documentation was being parsed to figure out what words ought to > be C or B or what-have-you. However, I didn't stick > around to see how the holy wars ended -- sanity might have won out. > > Needless to say, such heuristics ought to be deemed unPythonic and > stamped out in any mythical POD-for-Python implementation. ;-) I don't mind at all if a doc presentation system wants to be clever about what I wrote, just so long as I don't have to think about it. For example, I gave up on markup for Python docstrings several aborted proposals (and years) ago, but Ka-Ping Yee's pydoc code does a *fine* job of hyperlinking my plain-as-the-nose-on-my-face unmarked docstrings anyway. Now across Perl POD docs, I see an utterly incomprehensible mix of I<>, B<> and C<> tags, as markup-obsessed coders apparently make up their own conventions on the fly for how to spell "umm, OK, I'll use bold for package names and italics for method names -- today, much of the time". I expect I'd rather see no markup at all than that inconsistent gibberish -- Ping can figure out what they meant better than they can! Class, method and function names are indeed easily obtained via parsing the module. markup-for-idiots-is-markup-for-everyone-ly y'rs - tim From edloper@gradient.cis.upenn.edu Fri Mar 30 06:37:13 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 30 Mar 2001 01:37:13 EST Subject: [Doc-SIG] docstring markup: assorted thoughts.. Message-ID: <200103300637.f2U6bEp15916@gradient.cis.upenn.edu> I haven't been able to spend much time on doc-sig this week (and probably won't be able to spend much time this weekend, either). I've been reading, but having to hold myself back from responding, because otherwise I'd never get my other work done. :) But I did want to to make my positions on a few things clear... All that follows is just my opinion; feel free to disagree (preferably vocally rather than silently :) ). 1. I strongly believe that docstring markup should not include *any* "heuristic" rules. Put another way, I believe that if the markup will distinguish case A from case B, there should be a single, simple mechanism for distinguishing them. Examples of heuristic rules are "paragraphs that don't have normal indentation are treated as literals," and "ordered lists should be detected by one of umpteen cases, and only recognized when 2 elements with consective numbers occur in a row". These rules work on the principle of making misuse "improbable." They are a Bad Idea, because inevitably they will bite you. What should you do instead? Say things like: "literal paragraphs are marked with form XYZ; non-literal paragraphs must have normal indetation; otherwise, it's an error, and tools should complain".. (There's nothing wrong with defining syntax errors in a markup language!) 1.a. As a sidenote, this applies to URL detection, too. In particular, I think that we should take 1 of two courses: either give an explicit markup for URLs, or have the markup language say *nothing* about how to detect URLs, except maybe that tools may try to do it if they want. In particular, the *markup language specification* should not say things like "A url is anything matching big-ugly-regexp XYZ." I *believe* from what Guido's been saying that he would agree with me on this (that the markup should not have heuristic rules).. Care to confirm that? For me, context-sensitive use of punctuation is a borderline case.. For example, saying that '*' is a delimiter if it has whitespace one one side and non-whitespace on the other, but that it's an asterisk otherwise.. The most reasonable case for this can be made for apostrophes, which are used for 'quoting' and for contractions; 'but it's presumably easy to tell them apart.' I'm not *terribly* happy with context-sensitive punctuation, but I could certainly live with it. 2. There's often an attitude of "let's start off with something simple, and then add rules/heuristics later". This is a Bad Idea. Once you establish rules, people start using them. That makes it much harder to change them. For example, it's not nice to first tell people that '#' is not a markup character, let them write lots of docs, and then later tell them that you've decided that it's a markup character after all.. When we design this markup language, we should do it with the future in mind.. 3. As I said in , I believe that the most fundamental feature that the markup language must have is the ability to distinguish natural language text from other text. The way I have been envisioning ST fitting into this is: ANYTHING that's not natural language should be quoted. Thus, the fact that < and > and * etc. are used for markup is not a problem, because they're never really used in natural language. If you want to use them, you quote them, like: 'x*y>z'. However, I get the impression that most people (including Guido?) think that quoting everything that's not natural language is too difficult. (There's also the somewhat orthoganal issue of how to escape your quote character, but let's ignore that for now). That's an opinion I can respect, although I *personally* would be quite willing to quote all non-natural-language text when writing docstrings.. And I *personally* really don't care what character we use to do that quoting (apostrophy, backquote, hash, etc.), as long as the contexts it's used in are not contexts that that character would ever be used in for NL (=natural langauage). Perhaps most people would argue that there are really 3 categories here: natural language text; "literal text"; and everything else.. Where "everything else" is not NL, but should still be rendered in non-monospace, etc. With this view, it *does* make more sense to try to severly limit how many "special" characters there are, since "everything else" is likely to use every character you can think of, and then some. So I think that we should do one of the following: * decide that we don't mind quoting all non-natural-language text, and pick a quote character. Try to keep markup characters to a minimum, but not worry too much if we end up with more than 2 or 3 total. * decide that we want the 3 categories, and much more carefully pick a quote character, or maybe one or two "markup characters" that will be used for all inline markup (e.g., '<>' in POD). Note that, for the most part, we all agree that using '::' to mark the beginning of literal blocks, and indentation to mark their end, is acceptable.. So really we just need to worry about how to mark inline non-natural-language text and/or inline literal text. 4. Once we figure out how to mark inline non-NL text, we can think some about my second most fundamental feature: the ability to flag semantic fields, like descriptions of specific parameters, or of the return value, or of exceptions that can be raised. Obviously, these semantic fields will be very useful to tools. Javadoc does this with forms like:: @param p A planet @return The size of the given planet Most people think we should use trailing colons instead of leading at-signs, so we might have something like:: param q: The question of life, the universe and everything return: The answer to life, the universe, and everything (or 'arg' or 'argument' or whatever we decide) Then there's just the question of marking the scope of such expressions.. I believe that the most reasonable thing to do is to use indentation, since we're using "python style colons" anyway. So you can say either:: param x: description of x... or:: param x: description of x... more description of x.. * maybe even a bulleted list. of course, we'll have to make some decisions about where blank lines are required, etc., and how to distinguish this from the use of ':' in natural language.. (and no, I won't be happy with any heuristic rules for doing this. :) ). But I believe that this is on the right track. Another alternative, if we decide that we like description lists, is something like:: Parameters: x -- description of x y -- description of y But I think that we should *only* do that if you can also do it in text, because otherwise people will get confused. 5. Currently, I'm leaning towards making the markup pretty lightweight.. I.e., it wouldn't contain too many other features. Features it might contain inlcude lists, some sort of sectioning mechanism, some sort of emphasis mechanism, some way of doing endnotes, and some way of marking URLs. ----- In case people (incl Guido?) are interested, the following page contain some essays/etc. I wrote on formatted doc strings.. www.cis.upenn.edu/~edloper/pydoc Also, the PEP I've been writing gives (what I hope is) a clear definition of what we have been coming up with over the last month.. Unfortunately, it's not complete (it's currently about 700 lines, and probably about 1/2 complete..), and I've put it on hold for now, both because I've become busy, and because I think we may end up wanting to change a lot of it. But, as I've said before, if anyone wants a copy of what it says so far, send me email and I'll be happy to send you a copy. From Juergen Hermann" Message-ID: On Thu, 29 Mar 2001 23:35:46 -0500, Tim Peters wrote: >I don't mind at all if a doc presentation system wants to be clever abo= ut >what I wrote, just so long as I don't have to think about it. I do mind that. I don't mind if the system _IS_ clever about it . ;) >names and italics for method names -- today, much of the time". I expe= ct I'd >rather see no markup at all than that inconsistent gibberish -- Ping ca= n >figure out what they meant better than they can! Class, method and fun= ction >names are indeed easily obtained via parsing the module. And can be very misleading in some cases (I have functions named "text",= for example, which could make this text :) quite confusing after formatting)= . I'd prefer to mark up code with C<> (or whatever explicit markup) and then h= ave the doc system only be clever about what's inside C<> (like em for function = names, bold for operators, etc.). I DO see the problem with using physical markup like I<> for identifiers= , because everyone would do it differently. Ciao, J=FCrgen From tony@lsl.co.uk Fri Mar 30 09:06:39 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 30 Mar 2001 10:06:39 +0100 Subject: [Doc-SIG] Fat spec v0.2 In-Reply-To: Message-ID: <004e01c0b8f8$bae71340$f05aa8c0@lslp7o.int.lsl.co.uk> OK - *before* I read all 46 (!) messages in my Doc-SIG intray, I'll announce the new version of the "fat" specification: http://www.tibsnjoan.co.uk/docutils/fat.html The "old" version is at fat1.html, and there's a version with the differences highlighted in red (for those who don't want to read the whole thing again) at http://www.tibsnjoan.co.uk/docutils/fat1-diff.html Changes include: - all special characters are now reserved, except in "literal" contexts - I've reserved `@` as well, while I'm at it, as suggested the other day, for future expansion - backticks are now used for literal strings - mention is made of the "delimit list item bullets" idea - whilst I don't like it myself, I suggest `~` be used as the delimitor if we have to do it Who knows what will need to change *after* I read all the messages. (By the way - docutils *almost* implements all of this, and an alpha release to play with is *still* on the cards RSN - depending on flu, kids, etc.) Note for Guido - yes, the DOM tree stuff is irrelevent to the specification as such, but since I need it around for me, and don't want to manage more than one document (heh, I'm literally losing sleep to keep just one up-to-date!), people will have to ignore it for now. Eventually I would see more than one document, aimed at different audiences, of course. Back to the list (gods, I've got *work* to do as well) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 30 09:10:57 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 30 Mar 2001 10:10:57 +0100 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Message-ID: <004f01c0b8f9$54be2990$f05aa8c0@lslp7o.int.lsl.co.uk> 46 (!!!) messages on Doc-SIG to process this morning - aagh (but heh, it's a healthy sign, I hope) Goodger, David (?) wrote: > Just a little reminder here, guys. and he points out his previous documents (which for anyone who had downloaded early versions of docutils, you already have therein...) > - A Plan for Structured Text > http://mail.python.org/pipermail/doc-sig/2000-November/001239.html > > - Problems With StructuredText > http://mail.python.org/pipermail/doc-sig/2000-November/001240.html > > - reStructuredText: Revised Structured Text Specification > http://mail.python.org/pipermail/doc-sig/2000-November/001241.html Strangely enough, I planned to a. mention them this morning as "if we're reworking stuff radically, we should look at what David had to say", and b. read them this weekend (assuming Joan's flu doesn't get me - in which case, what with child's birthday party and flu, I might be out of commission for a while...) > Specifically, backticks ("`") are good for inline literals. I'm increasingly coming to like them for that - it also solves the "hard space" problem in plain` `text. > Hash marks ("#") are unbearably ugly. Hmm. I've come to like them, rather a lot - partly because they're ugly enough to stand out, which to me is good. > in an agile (new P.C. term for "lightweight"; > see http://www.agilealliance.org) markup scheme, we don't > need all four of (inline, block) x (alien text, Python code); > I think that's up to the tool to deal with. Of course, I disagree rather strongly (I would, wouldn't I!). Also, I don't like the un-emphasis on documentation on that page - if I develop software, documentation is for *me* - it's a vital communications media both to me-of-the-future, and also to customers who have to *use* the bloody stuff (I've worked long enough in the industry to go from no documentation and no up-front design towards some of each that I don't want to lose either, if you see what I mean. Anyway, that's another argument - docstrings are for *me*, for use in *my* code). NB: it's *not* "Python code" blocks - it's "doctest" blocks - i.e., if I'm going to use doctest, I don't see why my presentation tools shouldn't highlight the doctest fragments neatly for me! Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From ping@lfw.org Fri Mar 30 10:54:09 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 30 Mar 2001 04:54:09 -0600 (CST) Subject: [Doc-SIG] Re: POD (resend) In-Reply-To: Message-ID: Tim Peters wrote: > rather see no markup at all than that inconsistent gibberish -- Ping can > figure out what they meant better than they can! Thank you, Tim. :) Juergen Hermann wrote: > And can be very misleading in some cases (I have functions named "text", for > example, which could make this text :) quite confusing after formatting). Not at all. pydoc only hyperlinks function names if they are immediately followed by an opening parenthesis. This aligns with current conventions for talking about functions. Class names are indeed hyperlinked if they are just mentioned, but it is very rare (and i would argue bad practice!) to choose a class name that is a single uncapitalized common word. In short, when i made these decisions i tried to be quite conservative, and to carefully observe existing conventions. Then i tested these design choices by running pydoc over the entire standard library and looking at all of the generated pages. I was happy with the results, so i let it go out. As a general comment, i would recommend this kind of philosophy (conservative) and testing (extensive) for any proposed markup syntax. I know i have been very quiet of late on the Doc-SIG; partly this is because i am so overwhelmed by the traffic here and i want to be completely up-to-date so i can contribute something intelligent; partly this is because i am still holding out hope that it is possible to design a minimal, unambiguous syntax that preserves enough semantic information to make Guido and Fred happy, and i want to make a good serious attempt at doing so before i present any ideas. I think Greg made a good point about the typability of POD markup, but i prefer to avoid purely presentational markup. -- ?!ng From tony@lsl.co.uk Fri Mar 30 10:49:39 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 30 Mar 2001 11:49:39 +0100 Subject: [Doc-SIG] going awry In-Reply-To: Message-ID: <005201c0b907$1ec599f0$f05aa8c0@lslp7o.int.lsl.co.uk> Guido wrote: > > Have you tried to use ST to document a language that > > happens to place a special meaning on most of the ST > > special characters? (Like ST itself. :-) Erm - yes... I'm sorry, I'm *not* engaged in a *piss Guido off for the sake of it* contest, but seriously you (Guido) have come in in the middle of a debate (yes, because we asked you nicely, and the *fact* you've come in is bearing interesting fruit), and that inevitably means that you've missed a lot of discussions which have already happened. (by the way, my (mild) complaint about Python meetings *wasn't* that they kick-start things - it was the opposite. If a group is working (at large) on a problem and a smaller disjoint group meet together at the con, then that smaller group can have a *very* intense discussion and may divert the efforts. But they weren't necessarily the people who were involved in the first place, who now have no idea what has happened because they couldn't make the con. Do you see what I'm saying (and yes, it is a whine!)) One of my prime (in the back of my mind) considerations is that I should be able to write the docs for in itself. Believe it or not, the main reason I don't *do* that (after all, I already *have* a tool to turn it into HTML) is that it doesn't yet support tables, and I have no *intention* of trying to sort out tables in this round of work - they're just too fiddly to implement (cf embedded markup, which is easy to do, but too fiddly to bother with in the near future)). Ken Manheimer wrote: > I think it makes sense to have an easy way (a really easy > way:) to turn ST interpretation off for arbitrary extents That sounds like it might be a different idea than what was being discussed the other week. Edward Loper proposed being able to escape the quote character(s), but after a quick discussion between he, me and Edward Welbourne, we realised that it needed more leisurely thought than we had mindspace for. It's a difficult problem, *especially* in the context of docstrings, where use of backslash (the "obvious in the Python world" character) to escape things can get really rather messy. > There's a bit of a scope question in the latter, though - > would such a document be larger/more comprehensive than the kinds of > things we're concerned with in docstrings? Yes - look at the size of fat.html! However, it's still worth bearing in mind, if only as a stress test. I don't think, however, that it's something to worry about at this stage. I'll say again what I always say - in the ST world (yes, I know) it's not been enough of a problem to force a solution yet, and (as Edward pointed out) in "real life" in docstrings one can work around it without doing *too* much damage to the text one is writing. Personally, I'm not after perfection, just something that does what I want almost all of the time (which I thought was sort-of Pythonic). > (Re escapes - i'd like to see such things done keeping jim's original > intent that the motivations for structured text gestures make > sense in the context of the raw text as well as for their interpretation. That is *very* important to me, too. It's another stopper on sorting this out now, though... > Recentaly we *did* seem to actually be making progress! Erm - well, I thought so, and so did Edward Loper. > (I'm not sure what documentation you have and haven't had > identified - i don't have the URL for edward's STminus EBNF > specification, or tibs' stpy site - i'm hoping someone will > chime in with them, in case those are what you need...) It's now of historical interest, but the stpy spec (which I didn't point out to Guido 'cos I was about to produce a new release - I'm still wanting to get the alpha of the software out the door, if I ever get time - and anyway, it's been announced regularly enough on Doc-SIG over the last while) is at http://www.tibsnjoan.co.uk/docutils/status.html (actually, that's the status for the *previous*, out of date even before all this stuff, version of the tool, but it has pointers to other things). STminus is/was at http://www.cis.upenn.edu/~edloper/pydoc/stminus.html Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 30 10:50:20 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 30 Mar 2001 11:50:20 +0100 Subject: [Doc-SIG] New document - pytext-fat In-Reply-To: <200103292209.RAA20521@cj20424-a.reston1.va.home.com> Message-ID: <005301c0b907$370c5a80$f05aa8c0@lslp7o.int.lsl.co.uk> Guido van Rossum on fat.html 1.0 > - I think that the references to DOM trees are unnecessarily > implementation details -- even though I like the idea of a > formalized tree representation. As I say elsewhere, that's for my convenience - I'm trying to maintain *one* document, so it has to have everything stuffed into it. >(I happen to think DOM is overrated, Oh, I don't disagree - but it' there and we have tools to use it. Rather an available common format than something I home-grow, if we want tools to be able to use multiple parsers. >but I won't object against its use -- I do object against > mentioning it in the spec.) It would not be in a spec that *just* addressed the language for users. > - Call me oldfashioned, but I like having an extra space between > sentences. Emacs text mode has some very good heuristics for this. OK, I'll call you old-fashioned. Seriously, having been a long term TeX user, I used to try to sell this to people. It doesn't work. You are, of course, allowed to put the spaces in, and also to use a formatter that puts them back again... > - Why is it bad to insist on whitespace between list items? That > would make the block rules simpler and cause significantly fewer > situations where a list item is mistakenly started. At the very > least I'd insist on a blank line before the first item and after the > last item of a list. Long (past) discussion - the case was made cogently that we would lose people if we didn't do it, and it's *not* hard to do. It was much demanded, so it is there. > - The indentation rules are essentially those used for Python source. > I'm glad you say outright that you won't use them in the way ST uses > them I'll say again, we are *not* STClassic. We never intended to be. > -- as you may know by now, I think ST's use of indentation > level to derive heading levels is painful. We were stuck with that (for the moment) when we were being "compatible". But even then plans were afoot (at least in my head) to get away from it. > - Using --- for descriptive list is no better than --. (Note that > there's a typo in the example -- the second example uses '--' > instead of '---'. If you *have* to have descriptive lists, try > doing something creative with input that already looks the way you > propose that descriptive lists be rendered. No, you're missing the point again. *In the docstring, the text should read like normal text*. That's point 1. So use of "--" or "---" or whatever is the natural way to do it. This point that the docstring should look like *normal* text is *so* important - it's the *really* important idea behind ST, not all the cruft about particular implementations. *In the formatted result* I don't know *how* descriptive lists will look - that depends so much on the formatting mechanism (HTML, XHTML, an SGML thingy, TeX, LaTeX, texinfo, PDF, uncle tom cobbly and all) that there's no way I can (or should) mandate that. > I think that maybe, as > long as you are measuring indentation anyway, you *should* consider > the indentation of subsequent lines, requiring all lines of a text > paragraph to be indented the same, and marking the start of a new > block on a change in indent. After all, if we want the source to be > readable, we can't tolerate a ragged left margin (except in literal > blocks). Sure, there are people who indent the first line of their > paragraphs. There are also typesetting conventions that *dedent* > the first line of each paragraph. But for plain text, I've found > both conventions ugly and distracting, and I wouldn't mind ruling > these out so we can use indentation changes for other purposes. Internal paragraph indentation is a difficult one to decide about. For simplicity, ignoring it is the best approach, because it makes lists and things easier (I don't believe that *most* people want to write:: 1. This is some text - isn't that ugly and if they have to indent there... (I don't, though, think we *should* require them to have to)). > - Requiring the spaces around the delimiter doesn't help (it's too > subtle). (a) I disagree, and (b) it doesn't matter if "---" is a reserved sequence, 'cos it's not allowed except in that one context. > - An alternative could be to use -- or --- but require some kind of > explicit markup to start and end a descriptive list. It would, but I think it would be unobvious. > - I'm glad that you don't auto-renumber ordered lists. Previously stpy did, because of compatibility with ST. I didn't like it then, and would have considered it a thing to talk about after the 1.0 release (heh, I *don't* think even the first version would be perfect!). > - I'm not sure that there's a point to allowing disjoint text for > ordered (or any kind of) list. Again, if we believe that the source > should read as well as plain text, we should require that it is > formatted neatly. The disjoint text example you give seems to come > straight from the LaTeX (or similar) manual where it explains that > whitespace details in the input are ignored. But we *shouldn't* > ignore any whitespace details in the input, since that's our main > clue! If you mean the:: 1. and this is in the list type of example, it is there (so far as I am concerned) because text needs formatting before it is finished, and thus allowing the "empty bits I've to write yet" is good practice. > - Having to work around auto-detection of numbered lists is my #1 ST > pet peeve. I know that part of that's a ST bug -- but I still > believe ordered lists are not sufficiently important to warrant the > pain they occasionally cause. Then I think we may have a fairly fundamental problem/disagreement. Although the *only problem I can see is that if one has: Some text which runs on and uses a German ordinal 1. (is that right?) part way through then it will go wrong. I understand that you hate that with a deep and abiding hate (is the problem with the bullet characters for unordered lists as offensive?). It is a troubling matter. Personally, I don't much care, but I *do* like having "natural" lists, and am willing to go to (some) slight trouble typing to get them. In all of the instances quoted to me, though, it would be possible to warn the user that there may be a problem (i.e., there's no punctuation at the end of the previous line). And it is *very very* obviously wrong if you actually look at the formatted output (which I would hope authors would do!). > The Emacs text mode I use > automatically detects numbered lists and it is *never* what I want. I'm not responsible for that, nor do I know how it works. > At the very least you should require that the rest of the input is > neatly formatted the way one would format an ordered list in a plain > text document. That *who* would format it neatly? Using whose convention? I don't like trying to impose my own formatting conventions on people's use of text (and I think I would lose). > - The paragraph about intermingling is ambiguous. Is it natural to > have a list with some ordered items, some items using *, and some > items using -? I think not. If you meant nesting, of course you're > right -- but please say so, and give an example. Ah - sorry. The point is that the formatter needs to decide that a bullet item followed by a number item is actually two lists, not one. I'll have to look at rewriting that (it *was* written very fast!). > - Do we really need more than two levels of headings? I kind of doubt > it. Alternatively, we could allow numbered headings (of course the > numbers have to be supplied by the author) and derive the level from > the structure of the number. (Q: are unnumbered headings at higher > or lower levels than numbered headings? I dunno!) I would probably be OK with one. I'd fairly certainly be OK with two. But given using underlines, it's easy to do three, and I'm *sure* I'd be OK with three. I deliberately left other schemes for headings alone (did I? I think I left stuff out) because (a) its not important in a first release - heck, it's long enough already, and (b) it's really only needed for longer documents. > - About dedented paragraphs after indented sections: you can't really > express in regular text that a plain paragraph is not part of the > previous section unless you insert a heading. Maybe a better > alternative (again using the rule that we should never ignore the > whitespace clues in the source!) would be to simply indent indented > headings and and paragraphs, a la . Are we misunderstanding each other? One of the problems, for many people, with ST is the need to indent sub-sections. I agree that this can be a pain. I deliberately dropped it for fat.html, as a requirment, but it is still perfectly legitimate to indent the text if you want, and as (I thought it said) then dedenting will end an indented section. Remember that we are aiming at docstrings, where the number of headings (especially given label blocks) is likely to be low. > - I like the idea of anchor blocks -- they seem to be like References > in scientific papers. But why do they have to start with two dots? > And how much semantics (as opposed to formatting) do they need? The two dots makes it clear one is not starting the paragraph with a local reference. One needs *some* markup to do that, and someone came up with two dots last time round the Doc-SIG loop. I just kept the idea. > - Labels: I'm not sure I get the point. What is this for? The > "explanation" doesn't explain it for me. I think this is digressing > too far from the "plain text as documentation" idea. Last time round the Doc-SIG loop, there were a couple of requests that tied together. Several concepts like "Author" and "Arguments" came up, and there was a wish to generalise these, partly because we couldn't predict all of them. Some of them admit of having their information on one line, and it's nice to be able to control that. All of them contain information one might imagine a tool wanting to extract from the document, "standard" parts of text (cf the non-HTML in javadoc). Some people wanted to be able to have extensibility built in - arbitrary additional tags. They were made happy because the label (tag as it was then) can map easily to an XML tag. They also *look* like what people put in docstrings anyway - so we might as well gain leverage off that. > - The concept of children seems wrong for literal blocks. I agree > with the rule that a literal block starts after a paragraph ending > in "::" and ends at the first line that's indented the same or less > as that "::" paragraph; but I would propose that conceptually, the > entire literal block is a child of the previous paragraph. It is - that sounds like my explanation is deficient. A literal block is a single block. It cannot have children (by definition - they would be part of it, since their indentation is less than that of their parent block). I'll need to look at the explanation again - it's probably too close to implementation-speak. > - The example with a paragraph consisting of *just* "::" should render > that as a single colon, to be consistent. If you think this should > be special-cased, you need to explain why -- the argument "(a) it's > not worth preventing" doesn't really hold when you special-case it > anyway! This is a problem with a document typed fast, mostly an hour or two after my normal bedtime. There's one partricular subtle "gotcha" of the indentation rules that it helps around, but more importantly one has to decide to do *something* about an empty '::' paragraph, and I didn't want to forbid them, and I don't like a "hanging" colon. > - You can collapse most of the description of doctest blocks with that > of literal blocks -- they are really just a different way of > *recognizing* a literal block (the >>> start), they are not to be > treated differently (except by doctest). Note that we may not need > to recognize doctest blocks separately -- doctest is perfectly happy > with indented doctest blocks. Not quite, because a doctest does not span blank lines. And recognising doctest blocks separately lets me (a) have the eventual formetter present them differently (which I want), and (b) will eventually let me warn a user that they've got something that doctest might pick up in a literal block (where it presumably *isn't* Python code). > - In-line literals: I don't like the use of '...' for literals. It's > too unintuitive (unless you leave the quotes in the output!). Dealt with by the backtick idea, I hope. > - The section on Python literals is missing someting -- what is a > Python literal? From the example I have to guess that it's > something between hash marks. It's too ugly IMO. There was a *long* argument on this last time round the Doc-SIG. Everyone agreed it was ugly, but that was not the main reason for adopting it. For this one, can I please ask that you look back in the list? (for what little it is worth, I started out opposing them and will now defend them - I want them left in) > - URL recognition: you know my position. :-) I am sort of happy with ad-hoc recognition, but it really does give problems with trailing punctuation (*not* just fullstops), and going for ad-hoc recognition seems to me to be at odds with the "purity" of your approach on some other items... > Hope this helps, Despite my gruntles, opinions are useful. It would just be nice if they didn't all come at once, and didn't require *me* to stand up for debates that happened long ago and I only half remember. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Fri Mar 30 10:54:01 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 30 Mar 2001 11:54:01 +0100 Subject: [Doc-SIG] A plea In-Reply-To: <200103300637.f2U6bEp15916@gradient.cis.upenn.edu> Message-ID: <005401c0b907$bac90620$f05aa8c0@lslp7o.int.lsl.co.uk> OK, now I have everyone's attention... There is no way I'm going to be able to respond to everything that I want to respond to in today's flood on the Doc-SIG. I tried starting, but I'll not have time to finish. Heck, I probably won't even have time to *read* it all. I feel deluged - I nearly had an alpha out, and was dreading having to rehash old arguments after a PEP, but now it needs doing before a PEP, before even an implementation to point at and say "here, try that, it's not as bad as you think". The weekend is tomorrow. I've got our oldest's fifth birthday to worry about (eight small children at a party *does* make a horde), and will probably also be coming down with my partner's flu over the weekend (and yes, it's flu not a cold, so that's time in bed required, probably). I'm going to give up for today. I obviously can't tell when I'll be back next week - depends on illness, if it happens. I *had* hoped to take a day or so off to finish the alpha release, and I shall still try to do that (work permitting). So... What I personally plan to do is to try to produce a release of a tool that implements (todays draft, more or less) of the "fat" syntax. This will give us something to play with, and *may* help to defuse some of Guido's worries. It will *also* give us a testbed implementation to (a) mutate in place (i.e., it should have lots of switches to allow it to choose different subtelties of operation, and include/omit features), and (b) run it over the standard library, which should be instructive (if 99% of the standard library were to go through it pleasingly, I'd start to argue for a winner - not saying it will, but if...). I strongly stand with Ken and others - I want something that *looks like* text when I read it *as text*, but can be formatted for interrogation by tools (to some extent) and presentation with other tools (again, to some extent). I appreciate what Edward Loper has been doing for our work, and in my absence would suggest you listen to him - but I think he's also willing to give up too much, and encourage him to argue for more functionality. The specification of ST may be slack, the implementation may be rubbish (hmm - there's lots of people using it who *aren't* too unhappy, I gather, despite Guido), but the *concept* is right. (and, note, I am *not* adopting backslash as a quote character in this first version - the debate on that is recent enough that I think it is reasonable to expect people to find it and read it - we had good and sufficient reason to defer thinking about this.) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From guido@digicool.com Fri Mar 30 12:19:12 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 30 Mar 2001 07:19:12 -0500 Subject: [Doc-SIG] Re: POD In-Reply-To: Your message of "Thu, 29 Mar 2001 22:32:50 EST." <20010329223250.F13751@mems-exchange.org> References: <15043.45665.657583.867973@serenade.digicool.com> <20010329223250.F13751@mems-exchange.org> Message-ID: <200103301219.HAA22834@cj20424-a.reston1.va.home.com> > It's a simple, subtle change, but it really makes a difference in both > readability and writeability. I know that ST is even *more* readable and > writeable, but I too worry about the ambiguity of overloading apostrophes > and asterisks. > > The inter-paragraph stuff -- "=head", "=over", and so forth -- I can do > without. And most of it would be irrelevant in Python docstrings, since > they come with context "for free". I suspect that the only time you really > need headings (etc.) in docstrings is when you're trying to write a complete > module man page in docstrings. Here's an idea. The ST folks are already considering using <...> as special markup for URLs. How about we lobby for changing that to A<...> where 'A' stands for any single capital letter. We could still recognize *emphasis* (and maybe even **strong**, although I don't care for it), but instead of arguing whether to use '...', `...`, #...#, or something else for inline literal text, we can use C<...>. > [...] POD processors (or rather, the Pod::Parser module that I believe made > it into the standard library in Perl 5.6) would inspect the module whose > documentation was being parsed to figure out what words ought to be C > or B or what-have-you. However, I didn't stick around to see how the > holy wars ended -- sanity might have won out. Ping's doctest seems to do some of this. But I agree that it shouldn't be a standard feature. In some contexts it's quite hard to know when e.g. "socket" refers to the concept of a socket and when it refers to a module or function by the same name... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri Mar 30 12:28:19 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 30 Mar 2001 07:28:19 -0500 Subject: [Doc-SIG] Re: POD (resend) In-Reply-To: Your message of "Fri, 30 Mar 2001 04:54:09 CST." References: Message-ID: <200103301228.HAA22952@cj20424-a.reston1.va.home.com> > I think Greg > made a good point about the typability of POD markup, but i prefer to > avoid purely presentational markup. POD isn't purely presentational. --Guido van Rossum (home page: http://www.python.org/~guido/) From Juergen Hermann" Message-ID: Hi! What is your opinion on a "format" marker somewhere near the head of a module, to allow for different and pluggable markup parsers (and/or different versions of the same markup)? I see 3 options: 1. a comment like "#pragma docformat STNG" (yes, I'm a C++ guy, too ;) 2. a similar marker at the end of the module docstring 3. a special module variable __docformat__ (which has a default if it's = absent, namely the format used for standard library modules). The 1st has the distinct disadvantage that it's not available to importers of the module. The 2nd is bad because we have to intrude the very thing that we describe the format of. So personally, I prefer the 3rd option. Anyone sees any problems with that one? Ciao, J=FCrgen From guido@digicool.com Fri Mar 30 14:04:54 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 30 Mar 2001 09:04:54 -0500 Subject: [Doc-SIG] Allowing different docstring formats In-Reply-To: Your message of "Fri, 30 Mar 2001 15:21:04 +0100." References: Message-ID: <200103301404.JAA23404@cj20424-a.reston1.va.home.com> > What is your opinion on a "format" marker somewhere near the head of a > module, to allow for different and pluggable markup parsers (and/or > different versions of the same markup)? > > I see 3 options: > > 1. a comment like "#pragma docformat STNG" (yes, I'm a C++ guy, too ;) > > 2. a similar marker at the end of the module docstring > > 3. a special module variable __docformat__ (which has a default if it's > absent, namely the format used for standard library modules). > > The 1st has the distinct disadvantage that it's not available to > importers of the module. The 2nd is bad because we have to intrude the > very thing that we describe the format of. So personally, I prefer the > 3rd option. Anyone sees any problems with that one? Sounds good to me. --Guido van Rossum (home page: http://www.python.org/~guido/) From dgoodger@atsautomation.com Fri Mar 30 14:39:39 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Fri, 30 Mar 2001 09:39:39 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. Message-ID: Guido van Rossum writes: > In many fonts, backtick is hard to distinguish from apostrophe! In many fonts, '1' (one) is hard to distinguish from 'l' (el), or 'O' (upcase oh) from '0' (zero). Programmers (& other docstring writers) ought to choose a programmer-friendly font. ;-> /DG From guido@digicool.com Fri Mar 30 14:46:06 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 30 Mar 2001 09:46:06 -0500 Subject: [Doc-SIG] A plea In-Reply-To: Your message of "Fri, 30 Mar 2001 11:54:01 +0100." <005401c0b907$bac90620$f05aa8c0@lslp7o.int.lsl.co.uk> References: <005401c0b907$bac90620$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103301446.JAA23625@cj20424-a.reston1.va.home.com> > (and, note, I am *not* adopting backslash as a quote character in this > first version - the debate on that is recent enough that I think it is > reasonable to expect people to find it and read it - we had good and > sufficient reason to defer thinking about this.) Fine. But please do eat your own dogfood -- write all your docs, your spec and your API docs and a tutorial, using your own language, without cheating. Then show us the source for all that documentation. --Guido van Rossum (home page: http://www.python.org/~guido/) From dgoodger@atsautomation.com Fri Mar 30 14:51:49 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Fri, 30 Mar 2001 09:51:49 -0500 Subject: [Doc-SIG] New document - pytext-fat Message-ID: Guido van Rossum wrote: > - The example with a paragraph consisting of *just* "::" should render > that as a single colon, to be consistent. If you think this should > be special-cased, you need to explain why It's required to begin a docstring (or section) with a literal block:: """:: s = "This is a literal block" That's how you assign a string... """ I inserted it in my reStructuredText spec for completeness, and it is consistent: A paragraph which which ends with two colons ('::') signifies that all following **indented** text blocks are code blocks. ... When '::' is immediately preceeded by whitespace, both colons will be removed from the output. When text immediately preceeds the '::', *one* colon will be removed from the output, leaving only one (i.e., '::' will be replaced by ':'). When '::' is alone on a line, it will be completely removed from the output; no empty paragraph will remain. The difference between 'text::' and 'text ::' is because sometimes we don't want a colon at the end of the preceeding paragraph. '::' all by itself in a paragraph is simply a degenerate case of '::' preceeded by whitespace. /DG From dgoodger@atsautomation.com Fri Mar 30 14:57:34 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Fri, 30 Mar 2001 09:57:34 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. Message-ID: Tony J Ibbs (Tibs) wrote: > Goodger, David (?) wrote: > > we don't > > need all four of (inline, block) x (alien text, Python code); > > I think that's up to the tool to deal with. > > NB: it's *not* "Python code" blocks - it's "doctest" blocks So use the doctest-standard '>>> ' to start a literal block, such as:: Here's an example:: >>> import math >>> print math.pi 3.14159265359 Doctest won't care about the context (the '::' preceeding the doctest block). I don't see the win in having both 'inline alien text' and 'inline Python code'. The ugliness of the '#print "this is inline Python code"#' syntax far outweighs its usefulness, IMO. /DG From tony@lsl.co.uk Fri Mar 30 15:26:47 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 30 Mar 2001 16:26:47 +0100 Subject: [Doc-SIG] A plea In-Reply-To: <200103301446.JAA23625@cj20424-a.reston1.va.home.com> Message-ID: <000001c0b92d$d5563f00$f05aa8c0@lslp7o.int.lsl.co.uk> Guido van Rossum wrote: > Fine. But please do eat your own dogfood Yuck. That's a horrible image - have you ever *tasted* dogfood? Or *smelt* it? > -- write all your docs, your > spec and your API docs and a tutorial, using your own language, > without cheating. Then show us the source for all that documentation. Hmm. Two answers - one yes, and one no. Categorically yes, as I try to write decent docstrings, and thus have to talk about what the software does. And the docstrings have to be in the format (of course they do, 'twouldn't be much good else). (by the way, I'm not sure what you mean by "cheating" - how could one cheat? It either goes through the tool and comes out the other end as one wishes, or it doesn't, surely?) But also categoricallly no - *I* am not one of the people who believe that all documentation should be written in this format - it's for docstrings, and if it's also useful for not-using-TeX (name your favourite markup language or tool not to use) then that's a coincidence. And not one I necessarily want to push. There is *no* way that I can expect to write text that actually looks like the fat.html file, for instance, in a docstring tool (you might say I shouldn't want to write stuff that looks like that in anything, and I might agree, but let's pretend we don't say that). I *like* decent tutorials, reference manuals, etc (I *very* highly respect people who are good at writing them). I've been arguing *against* Ka-Ping Yee (excellent person though he be) on his wish to use docstrings for everything. I shan't change now. But *please* remember that I'm serious about having only a little time to spare. I took on this task (reluctantly) in the first place because I saw the Doc-SIG going for yet another round of excitement followed by abrupt disappointment. I *knew* I could get something done fairly quickly. The "fairly quickly" got scuppered by a death in the family, but the principle stands. Then Edward Loper joined in, and we were doing quite well on getting towards a PEP or two. Because of only having a little time, I need to decide how to *invest* that time. Some things I *know* will take a long time to resolve well. They make sense to leave alone for a while. Also, some answers come out "in the wash" once one has experience. Those are best left for a while as well. Deciding what belongs in what category is part of being a worker on the Doc-SIG. And I still assert that relevent experience from elsewhere should also be taken on board in making such decisions. I know we have philosophical differences about all of this. I *feel* that *my* best way of explaining things is with an example implementation and the documentation that goes with it. Something solid to discuss makes life a lot easier (I cite the types-sig as another example - now *that*s exciting stuff). [[[Of course, not feeling that someone is going to decide on a Pod-variant whilst I'm away would help (I wish I could remember why Pod was discarded as a possibility in the past, but I *really* don't have time to look now). Why couldn't all of these enthusiastic people have been around at the end of last year, eh?]]] Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From guido@digicool.com Fri Mar 30 15:27:10 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 30 Mar 2001 10:27:10 -0500 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: Your message of "Fri, 30 Mar 2001 09:39:39 EST." References: Message-ID: <200103301527.KAA23817@cj20424-a.reston1.va.home.com> > Guido van Rossum writes: > > In many fonts, backtick is hard to distinguish from apostrophe! > > In many fonts, '1' (one) is hard to distinguish from 'l' (el), or 'O' > (upcase oh) from '0' (zero). Programmers (& other docstring writers) ought > to choose a programmer-friendly font. ;-> I know there's a smiley, but this still reminds me of an attitude that I have occasionally seen in this discussion. On the one hand, I hear a lot of "we can't require people to change their habits because we know they don't want to change their habits". On the other hand, a lot of objections are countered with "then just change your habits to work around ". This seems to come from a confusion between two similar, but different goals: - It should be easy to read without any knowledge of the markup language - It should be possible to author without knowing the whole markup language and without changing your habits I can agree with the first one, but I think the second will continue to get us into trouble. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri Mar 30 15:28:54 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 30 Mar 2001 10:28:54 -0500 Subject: [Doc-SIG] New document - pytext-fat In-Reply-To: Your message of "Fri, 30 Mar 2001 09:51:49 EST." References: Message-ID: <200103301528.KAA23838@cj20424-a.reston1.va.home.com> > It's required to begin a docstring (or section) with a literal block:: > > """:: > > s = "This is a literal block" > > That's how you assign a string... > """ I see no need for this. According to the style guide, a docstring should *always* begin with a one-line summary anyway. > I inserted it in my reStructuredText spec for completeness, and it is > consistent: > > A paragraph which which ends with two colons ('::') signifies that all > following **indented** text blocks are code blocks. ... > > When '::' is immediately preceeded by whitespace, both colons will be > removed from the output. When text immediately preceeds the '::', *one* > colon will be removed from the output, leaving only one (i.e., '::' will > be replaced by ':'). When '::' is alone on a line, it will be completely > removed from the output; no empty paragraph will remain. > > The difference between 'text::' and 'text ::' is because sometimes we don't > want a colon at the end of the preceeding paragraph. '::' all by itself in a > paragraph is simply a degenerate case of '::' preceeded by whitespace. Ah, I see. Can't say I like it very much though -- seems too subtle. --Guido van Rossum (home page: http://www.python.org/~guido/) From tony@lsl.co.uk Fri Mar 30 15:51:51 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Fri, 30 Mar 2001 16:51:51 +0100 Subject: [Doc-SIG] using the same delimiter on the left and right.. In-Reply-To: <200103301527.KAA23817@cj20424-a.reston1.va.home.com> Message-ID: <000201c0b931$56576cc0$f05aa8c0@lslp7o.int.lsl.co.uk> I was just about to go, when this came in. Yes! This is the sort of thing I want Guido around for (or anyone else who makes the implicit explicit so neatly)... > This seems to come from a confusion between two similar, but > different goals: > > - It should be easy to read without any knowledge of the markup > language which seems to me to be very Pythonic > - It should be possible to author without knowing the whole markup > language and without changing your habits and of course this seems to be non-Pythonic > I can agree with the first one, but I think the second will continue > to get us into trouble. Yes. Of course, now there's the small matter of working out what each person *means* by the statements, but that's a *useful* nugget for me to think on over the weekend. And if we're aiming for Python-ness (?!?) then one should be able to *guess* what to do from a small amount of knowledge, and that's a can of worms in itself... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From gward@mems-exchange.org Fri Mar 30 17:39:30 2001 From: gward@mems-exchange.org (Greg Ward) Date: Fri, 30 Mar 2001 12:39:30 -0500 Subject: [Doc-SIG] Re: POD (resend) In-Reply-To: ; from tim.one@home.com on Thu, Mar 29, 2001 at 11:35:46PM -0500 References: <20010329224038.G13751@mems-exchange.org> Message-ID: <20010330123930.E14978@mems-exchange.org> [me, waxing holier-than-thou] > Needless to say, such heuristics ought to be deemed unPythonic and > stamped out in any mythical POD-for-Python implementation. ;-) [Tim moderates] > I don't mind at all if a doc presentation system wants to be clever about > what I wrote, just so long as I don't have to think about it. For example, I > gave up on markup for Python docstrings several aborted proposals (and years) > ago, but Ka-Ping Yee's pydoc code does a *fine* job of hyperlinking my > plain-as-the-nose-on-my-face unmarked docstrings anyway. After posting last night, I reconsidered a bit. A bit of clever heuristics are OK; I think the main problem with them in POD-for-Perl land is that they have been inconsistent (until Perl 5.6, every pod2foo tool implemented its own Pod parser) and ill-documented. > Now across Perl POD docs, I see an utterly incomprehensible mix of I<>, B<> > and C<> tags, as markup-obsessed coders apparently make up their own > conventions on the fly for how to spell "umm, OK, I'll use bold for package > names and italics for method names -- today, much of the time". Absolutely true! I've been guilty of that myself. > I expect I'd > rather see no markup at all than that inconsistent gibberish -- Ping can > figure out what they meant better than they can! Class, method and function > names are indeed easily obtained via parsing the module. Yes, yes, yes. FWIW, the main thing I would like to steal from POD is the tag syntax: single capital later and angle braces, simply because it's the best compromise I've yet seen of readability/writeability with unambiguousness. Everything else, including the set of available tags, is up for discussion. Greg From guido@digicool.com Fri Mar 30 17:44:58 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 30 Mar 2001 12:44:58 -0500 Subject: [Doc-SIG] A plea In-Reply-To: Your message of "Fri, 30 Mar 2001 16:26:47 +0100." <000001c0b92d$d5563f00$f05aa8c0@lslp7o.int.lsl.co.uk> References: <000001c0b92d$d5563f00$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103301744.MAA24395@cj20424-a.reston1.va.home.com> > Guido van Rossum wrote: > > Fine. But please do eat your own dogfood > > Yuck. That's a horrible image - have you ever *tasted* dogfood? Or > *smelt* it? I didn't invent it -- I was appalled too when I first heard it, but it surely conveys the intention very strongly. :-) > > -- write all your docs, your > > spec and your API docs and a tutorial, using your own language, > > without cheating. Then show us the source for all that documentation. > > Hmm. Two answers - one yes, and one no. > > Categorically yes, as I try to write decent docstrings, and thus have to > talk about what the software does. And the docstrings have to be in the > format (of course they do, 'twouldn't be much good else). > > (by the way, I'm not sure what you mean by "cheating" - how could one > cheat? It either goes through the tool and comes out the other end as > one wishes, or it doesn't, surely?) Oh, forget it. I meant using additional tools, but it's not important. > But also categoricallly no - *I* am not one of the people who believe > that all documentation should be written in this format - it's for > docstrings, and if it's also useful for not-using-TeX (name your > favourite markup language or tool not to use) then that's a coincidence. > And not one I necessarily want to push. Ah. I agree. I *thought* that some of the proposals I've seen were written using a similar tool, but clearly not yours -- maybe I was hallucinating. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri Mar 30 18:16:16 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 30 Mar 2001 13:16:16 -0500 Subject: [Doc-SIG] New document - pytext-fat In-Reply-To: Your message of "Fri, 30 Mar 2001 11:50:20 +0100." <005301c0b907$370c5a80$f05aa8c0@lslp7o.int.lsl.co.uk> References: <005301c0b907$370c5a80$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200103301816.NAA24470@cj20424-a.reston1.va.home.com> > > - Using --- for descriptive list is no better than --. (Note that > > there's a typo in the example -- the second example uses '--' > > instead of '---'. If you *have* to have descriptive lists, try > > doing something creative with input that already looks the way you > > propose that descriptive lists be rendered. > > No, you're missing the point again. > > *In the docstring, the text should read like normal text*. That's what I was proposing! OK, concretely, I was proposing (but only half seriously) to recognize the following: Breadcrumbs Those little things that fall out of the loaf when you cut it in half. Wastebasket Where you dump the bread that you can't eat this weak, or that appears like it's been eaten by green fungus. as a descripting list. > That's point 1. So use of "--" or "---" or whatever is the natural way > to do it. This point that the docstring should look like *normal* text > is *so* important - it's the *really* important idea behind ST, not all > the cruft about particular implementations. > > *In the formatted result* I don't know *how* descriptive lists will > look - that depends so much on the formatting mechanism (HTML, XHTML, an > SGML thingy, TeX, LaTeX, texinfo, PDF, uncle tom cobbly and all) that > there's no way I can (or should) mandate that. If there's too much variation in what they will look like, I believe that you're doing the author a disfavor -- authors tend to spend some time debugging the output for the processor they are familiar with, but you can't expect them to attempt to preview it using all possible processors. Since authors care about the final looks, we shouldn't give them a reason to worry about surprises with unfamiliar processors. (Taken into the extreme, this leads to WYSIWIG -- which is not a bad principle, but unfortunately unobtainable in the given context for now.) > Internal paragraph indentation is a difficult one to decide about. For > simplicity, ignoring it is the best approach, because it makes lists and > things easier (I don't believe that *most* people want to write:: > > 1. This is some > text - isn't that ugly > > and if they have to indent there... (I don't, though, think we *should* > require them to have to)). Anything that makes the source more readable is fair game to me. This is one of the guiding principles! > If you mean the:: > > 1. > > and this is in the list > > type of example, it is there (so far as I am concerned) because text > needs formatting before it is finished, and thus allowing the "empty > bits I've to write yet" is good practice. So indicate it with XXX or something like that. If it looks this funny in the input, the author probably intended something specific, so we should attempt to make it look the same in the output. Principle of least surprise. > > - Having to work around auto-detection of numbered lists is my #1 ST > > pet peeve. I know that part of that's a ST bug -- but I still > > believe ordered lists are not sufficiently important to warrant the > > pain they occasionally cause. > > Then I think we may have a fairly fundamental problem/disagreement. > Although the *only problem I can see is that if one has: I'll hold off on this one until I see your final rules. I'd like to see what you recommend to work around this -- please don't tell me to change my sentences! > > At the very least you should require that the rest of the input is > > neatly formatted the way one would format an ordered list in a plain > > text document. > > That *who* would format it neatly? Using whose convention? I don't like > trying to impose my own formatting conventions on people's use of text > (and I think I would lose). I think you've seen the light here given your latest post. We can enforce any convention we like as long as it reads well. > Ah - sorry. The point is that the formatter needs to decide that a > bullet item followed by a number item is actually two lists, not one. But this would be silly input. Why would you place a bulleted list and a numbered lists adjacent without some text in between? I say a list is a list and if you want to mix item styles (maybe for a joke) that doen't make it two lists. > > - About dedented paragraphs after indented sections: you can't really > > express in regular text that a plain paragraph is not part of the > > previous section unless you insert a heading. Maybe a better > > alternative (again using the rule that we should never ignore the > > whitespace clues in the source!) would be to simply indent indented > > headings and and paragraphs, a la . > > Are we misunderstanding each other? > > One of the problems, for many people, with ST is the need to indent > sub-sections. I agree that this can be a pain. I deliberately dropped it > for fat.html, as a requirment, but it is still perfectly legitimate to > indent the text if you want, and as (I thought it said) then dedenting > will end an indented section. That has only meaning in the DOM tree, not in the rendition on paper or screen. I see no use for it. Since indented paragraphs *do* have a purpose in text documents (e.g. some styles use it for quotations), I say that an indented paragraph should be rendered as an indented paragraph! This would seem to fit all the other guidelines that we're trying to apply. > Remember that we are aiming at docstrings, where the number of headings > (especially given label blocks) is likely to be low. Yes. Another issue is that the entire docstring (except for its first line) is likely to be indented; this should be accounted for first. > Last time round the Doc-SIG loop, there were a couple of requests that > tied together. > > Several concepts like "Author" and "Arguments" came up, and there was a > wish to generalise these, partly because we couldn't predict all of > them. Some of them admit of having their information on one line, and > it's nice to be able to control that. All of them contain information > one might imagine a tool wanting to extract from the document, > "standard" parts of text (cf the non-HTML in javadoc). > > Some people wanted to be able to have extensibility built in - arbitrary > additional tags. They were made happy because the label (tag as it was > then) can map easily to an XML tag. Hm. All this flexibility seems to go against the idea of *simple* markup for docstrings. We can't be feature-happy! > They also *look* like what people put in docstrings anyway - so we might > as well gain leverage off that. I've never put a [label] in a docstring. > A literal block is a single block. It cannot have children (by > definition - they would be part of it, since their indentation is less > than that of their parent block). > > I'll need to look at the explanation again - it's probably too close to > implementation-speak. I propose a different concept: there should be no concept of paragraphs or blocks being children of other paragraphs. Instead, a (sub)section contains a sequence of paragraphs and blocks, some of which are more indented than others. [On Python literals] > There was a *long* argument on this last time round the Doc-SIG. > Everyone agreed it was ugly, but that was not the main reason for > adopting it. For this one, can I please ask that you look back in the > list? Then please give me a URL of a thread to start. I can't very well go searching the entire archive looking for "argument", can I. :-) > > - URL recognition: you know my position. :-) > > I am sort of happy with ad-hoc recognition, but it really does give > problems with trailing punctuation (*not* just fullstops), and going for > ad-hoc recognition seems to me to be at odds with the "purity" of your > approach on some other items... Read the code in the FAQ wizard (Tools/faqwiz/ in the Python distribution). It deals with much more than full stop, and works very well in practice -- not just "well enough". > > Hope this helps, > > Despite my gruntles, opinions are useful. It would just be nice if they > didn't all come at once, and didn't require *me* to stand up for debates > that happened long ago and I only half remember. The PEP process requires you to summarize past debates for this reason. Too bad we didn't have a PEP process way back when this was first being discussed... --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Fri Mar 30 18:27:08 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 30 Mar 2001 13:27:08 -0500 Subject: [Doc-SIG] A plea In-Reply-To: <000001c0b92d$d5563f00$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: [Tony J Ibbs (Tibs)] > ... > [[[Of course, not feeling that someone is going to decide on a > Pod-variant whilst I'm away would help (I wish I could remember why Pod > was discarded as a possibility in the past, but I *really* don't have > time to look now). Heh. Tibs, the Doc-SIG hasn't gotten any proposal thru in something like 4 years, so don't worry about it getting suddenly settled next week . > Why couldn't all of these enthusiastic people have been around at the > end of last year, eh?]]] POD has fans just like ST has fans, and they're alike too in that public displays of affection wax and wane according to some secret schedule only God understands. I doubt we'll hear much more about POD now for a while, unless someone actually pops up with an implementation. follow-your-muse-and-ignore-the-sewer-rats-nibbling-at-your- ankles-ly y'rs - tim From dgoodger@atsautomation.com Fri Mar 30 18:57:03 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Fri, 30 Mar 2001 13:57:03 -0500 Subject: [Doc-SIG] syntax vs semantics: implicit --> explicit Message-ID: I've found that many of our discussions about auto-documentation generators unnecessarily (and confusingly) mix arguments from different levels (syntax vs. semantics, and multi-layered semantics at that). In an effort to further make implicit explicit, and to reduce confusion & frustration, I think it's important to separate our discussions based on individual components. At least, we should be conscious of 'where we're coming from' and make that more explicit. For example, I think it's counterproductive to talk about the syntax of a particular construct (e.g. characters used to delimit literals) in the same breath as talking about a Python-specific concept (e.g. hyperlinks generated from the interpretation of literals in a Python-specific context). If the syntax is right, the semantics should fit. Of course, the syntax discussion is at least partially being driven by semantics. I am proposing that we be more explicit about the motivations behind our suggestions. On to a definition of terms, using block diagrams (useful for a blockhead like me :-): The parser is the basic component which takes raw text as input and produces a data structure as output:: +--------+ text --> | parser | --> parsed data structure +--------+ (internal, e.g. DOM tree) Depending on what we want to do with the data, we'll need output formatters:: +-----------+ structured data --> | formatter | --> formatted data (internal) +-----------+ (XML, HTML, TeX, info, etc.) A simple converter program would just need to link the two:: +------------------------------+ | converter | | +--------+ +-----------+ | text --> | | parser | --> | formatter | | --> formatted data | +--------+ +-----------+ | +------------------------------+ Now, when we get into auto-doc-generators (like HappyDoc, Crystal, pydoc, etc.), we need to add Python-specific knowledge to the mix:: +-------------------------------------------+ | Python Documentation Processor | | | | +---------------------------------------+ | | | operating logic: | | | | knowledge of Python syntax, docstring | | | | conventions and rules | | | +---------------------------------------+ | | | | +-----------------+ +------------+ | | | Structured Text | | output | | | | parser | | formatters | | | +-----------------+ +------------+ | | | Python-specific | | | | extensions | +------------------+ | | +-----------------+ | Python language | | | +--------------+ | services | | | | (potentially | | (parser.py, xml, | | | | other input | | inspect, etc.) | | | | parsers) | | | | | +--------------+ +------------------+ | +-------------------------------------------+ I don't know about others on this list, but I would like to use an ST-like markup language for more than just Python docstrings. I'd like to use it for documentation of all kinds, from how-to manuals to web pages (to books even, for crazies like me). When discussing Python docstrings, section hierarchy features (section titles) are less important than for writing a magazine article. This forum, of course, is specifically geared toward Python documentation. But am I unreasonable in thinking that this markup scheme has broader applications? See the Setext specification (http://www.bsdi.com/setext/) for its history; basically, it was used for a pre-web electronic newsletter, TidBits, whose texts were quite long. (Last year I wrote a chapter on Python for Wrox Press' "Professional Linux Programming". I would have been much happier using a complete ST-like markup than futzing around in MSWord.) I believe that the operating logic/rules/conventions ought to be separated conceptually and code-wise from the parser. The parser itself should be separated into generic and Python-specific parts. These things should not be tied together, at least not so strongly. Opinions? Flames? I've got my asbestos suit on! Thanks for reading my idle ramblings! /DG From Edward Welbourne Sat Mar 31 10:16:38 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 31 Mar 2001 11:16:38 +0100 (BST) Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103251545.f2PFjep11824@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103251545.f2PFjep11824@gradient.cis.upenn.edu> Message-ID: > ... there's an argument to be made for having the environments in > which emph can start/end be the same as the environments that literal > can start/end in. hmm. Good point. > ... allow things like ' x '. And actually, thinking > about it now, I don't see why we would want to.. Inlines, whether code or verbatim - at least when appearing in a paragraph which user agents are at liberty to fit neatly into a width chosen by the user, using the standard text-flow idioms - must live by the rules of paragraph-text: which say that any sequence of horizontal space characters may be replaced by a single space or by a newline-and-indentation; and that these `mean' the same thing as one another - if the reader choses where to break the paragraph into lines, the author's choice of where to do so in the source file should be ignored. Furthermore, someone modifying a doc-string paragraph may fail to notice a verbatim fragment at the far end of the paragraph and, having changed the text, asked some random editor to do what it thinks of as paragraph reformat; in the action of which the most we can rely on is that its treatment of white space will not interfere with the rules above. Ergo, before looking for any colouring markup, *including* 'verbatim', within a flowable paragraph the paragraph *should* have each newline-and-indent replaced with a single white space (optionally swallowing any trailing space on the line thus ended) and *may* have each sequence of horizontal space replaced with a single horizontal space at the same time, on the grounds that if what the text then says isn't `the same' as the author's intent, the author will be tripping over someone's editor's formatting about as soon as someone who doesn't know about the fragile inline gets to edit the doc-string. None the less (especially for verbatim inlines) the user agent is equally at liberty to honour horizontal space, once each newline-and-indent has turned into a space. To forbid inlines, including verbatim, from including spaces would be excessive; to allow them, we must allow inlines, at least within paragraphs, to stradle line-breaks; and authors must accept that space may be re-arranged - either by the maintainer's editor's paragraph-reformatter, or by the user agent displaying the text as a paragraph. To avoid conflict with what it's reasonable to expect authoring and display tools to do, the doc-string toolkit should avoid promising that whitespace within paragraphs will be taken literally; by avoiding that promise, it can allow inlines to straddle line boundaries within paragraphs, with the newline-and-indent understood as a simple space, and on the understanding that the author has enough sense not to use inline fragments which will be sensitive to space-munging. This must equally hold for `verbatim' inlines, at least when they appear in text-paragraphs: but, if they're honest citizens of the paragraph, they shouldn't *mind* being `interpreted' in the mild degree called for; and the tools are at liberty to *only* reduce newline-and-indent, without taking the trouble to normalise other sequences of space. Either the fragment shouldn't be inlined in a text-paragraph, or its meaning should be unchanged by messing with its whitespace. The way to say a verbatim fragment which doesn't keep to those rules is like this:: ' x ' Putting it in a separate block gives subsequent folk editing the doc-string the option of reformatting the paragraphs which look like paragraphs, safe in the knowledge that anything this might mess up will be isolated in its own block; and will be needed if the doc-tools are to present the fragment to the user agent in a style which *does* promise to preserve space. Anything inside a paragraph should be understood as equivalent to whatever valid text flow could turn it into (which only ever reduces sequences of blank characters (space, newline, tab, form-feed, etc.) to a single space or a newline-and-indent, so an inline only containing these will never `expand' to contain repeats or other blanks). Now, outside the paragraph, one may note that verbatim fragments, e.g. used as the items in a list, have a right to have fancy spacing honoured: but it may be simpler to apply the same rule as in a paragraph (especially if your grammar regards `list item' as a variety of text paragraph). But, at least inside paragraphs, inlines, even verbatim, should be interpreted `space-insensitively' with each newline-and-indent normalised to a single space. Eddy. From Edward Welbourne Sat Mar 31 10:10:15 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 31 Mar 2001 11:10:15 +0100 (BST) Subject: [Doc-SIG] formalizing StructuredText In-Reply-To: <200103251545.f2PFjep11824@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103251545.f2PFjep11824@gradient.cis.upenn.edu> Message-ID: > to *only* allow a *single* word to be emphasized. If you want to > emphasize multiple words you have to *do* *it* *like* *this*. excellent idea ;^) You're emphasising those words, and having to do it for *each* of the words emphasised is so much more *emphatic*, after all. If what's to be emphasised isn't *short*, *sharp* and *to* *the* *point*, it's a waste of emphasis - declaring too many things to be important is like crying wolf (or declaring too many things to be dangerous), it squanders a useful communication idiom - the reader will be more apt to ignore the request to regard the marked text as `important'. > at once anyway. *It just looks weird, and is hard to read, if you try > to emphasize a big region.* and anyway: If you have something important to say, but no terse expression of it, consider presenting it as an inset paragraph (e.g. one inside a blockquote) instead of leaving it, coloured or not, embedded in some other paragraph. The very abruptness of the resulting break in the text will emphasise it amply, without colouring - thereby leaving colouring available, within the inset paragraph, for local tweaks such as colouring is well suited to expressing. Eddy. From Edward Welbourne Sat Mar 31 15:05:58 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 31 Mar 2001 16:05:58 +0100 (BST) Subject: [Doc-SIG] LaTeX question In-Reply-To: References: Message-ID: > I don't understand what I should exactly write instead of tilde. ( > because %7e counts as a remark nor does \symbol{"7e}) Have you tried \~{} > the tilde character (��) is mis-handled; hrm. By the \url{} directive ? Anyhow, being confused by what you're saying, here's what's special about tilde in TeX: The ~ character is TeX's non-breaking space. You can obtain a ~ accent on a letter, e.g. n, by writing \~n or \~{n} and, in the second form, you can use \~{} to give \~ nothing to put its accent on, so it gives you a ~ character, of sorts. There may also be something like \tilde somewhere in TeX's huge vocabulary of defined names, but I don't know it. However, the problem with \url{url} may be that the \url command does some weird things to its arguments which make a mess of the results. The answer in such a case would be to fix the definition of \url ... anyone tell me where the relevant definition is in a .sty file or similar and I'll see what I can do to it. LaTeX is infinitely flexible, albeits internals nearly unmaintainably ugly. Eddy. From Edward Welbourne Sat Mar 31 15:21:33 2001 From: Edward Welbourne (Edward Welbourne) Date: Sat, 31 Mar 2001 16:21:33 +0100 (BST) Subject: [Doc-SIG] Re: docstring signatures In-Reply-To: <200103281530.f2SFUYp29461@gradient.cis.upenn.edu> (edloper@gradient.cis.upenn.edu) References: <200103281530.f2SFUYp29461@gradient.cis.upenn.edu> Message-ID: > (incidentally, this use is the main reason that I support DLs in > documentation strings..) and your other illustration, > argument x: description of x > throw y: description of y would make a good alternative way to approach DLs, even if only as: item a: description of a item b: and so on with item being some base label whose meaning is specialised by argument and throw. Though I must say I prefer the -- form, albeit maybe modified to use some other punctuator for the sake of folk like Guido who've picked up (let me guess -- from writing TeX ?) the habit of using -- as en-dash, so as to reserve - for hyphen (and minus). But note that TeXies might well also use --- (for em-dash, naturally) and it gets a bit excessive to go to ----. How about -: or some similar punctuator, maybe -:- would do ? Or ~~ or @: or @- etc. It needs to `feel' right and not be in common use. Arguably .. would be good, or :: appearing *not* at the end of a line. Of course what would be ideal must surely be some sort of mimic of the lexical structure of dictionaries, { key: value, yek: eulav }, though how to parody that into a form suitable for ST I can't say. Eddy.