[Doc-SIG] docstring grammar

Edward Welbourne Edward Welbourne <eddyw@lsl.co.uk>
Wed, 1 Dec 1999 19:55:47 +0000


David said:
> Or not.  Luckily I think that issue can be left to the 'bibliography
> engine', just like the bullet processing can be left to the 'list engine'.

Yup.  We've explored more than enough of the territory towards each of
these: working out what to do with the loose ends is now down to the
level where I'll trust whoever *does the work* to implement a tool that
`does something sensible' and then we can take that sensible, abstract
it away from that reference implementation and call it a docstring spec ;^)

Skip said:
>     len(o:sequence) -> IntType
no, yuck, don't do it.

Pack that information into the argument sections by all means; but the
way for that one-liner to `name' the arguments should be about getting
across the `what does this argument mean' information.  Being told

    transcribe(s1:stream, s2:stream)

doesn't tell me the thing I really want to know, where

    transcribe(source, target)

tells me the only thing I really care about (given that the arguments
section will say that source and target are streams - aka file
descriptors - and I probably found the function in a module which
defines tools for manipulating streams so this part is obvious).  (I'd
have called those arguments (from, to) but for the keyword ...)

Crucially, transcribe(source, target) looks just like a real call of the
function and is archetypical among calls of the function.

David said:
> I'd like to finalize the top-level structure, get it in front of GvR's
> eyeballs, and then we can tackle each subtopic (so far: list processing,
> reference handling, signature, mandatory keywords, keyword registration
> process, multilingual keyword support, etc.) at a later date.
Yes please.

Tibs said, of the Example:/>>> debate:
> No - keep the keyword. ... (a) I like it ... (b)... non-test Python
> script ... (c)... 'logical' subdivision ...
> unless he means "for humans to parse"

OK, start with the last: Tibs, you observed a while back that the human
brain holds up to 7 (or is it 12 ?) things at the same time.  That's the
`for humans to parse' constraint.  We want to keep to a minimum the set
of keywords a programmer needs to be familiar with to be able to get
pay-back from using the document format.

(a) I'll merrily vote contrary to you and hope that we cancel out.
    Then we can come to the observation that Tim Peters seems to want to
    be able to do >>> without the keyword.  I know you'll stop arguing.
(b) so ? we're bound to have *some* form of construct equivalent to
    <PRE>, so the non-test pieces can be indented with that, leaving the
    more common `this is what would really happen, try it and see'
    flavour of embedded code (which Tim's tool will duly verify ;^) to
    be written the way the pythoneer wanted to write it.
(c) A bunch of lines sharing initial indent and mostly starting >>> form
    a logical sub-division just as long as the audience know to
    recognise it as such - and the audience here consists of pythoneers,
    so we will recognise it.


Various folk discussed language.  My ha'p'n'rth on that would go for a
variable in the module namespace, nominally

__language__ = 'en:UK'	# expect English spellings, like colour, sulphur

and I'd vote for the *default* to be Dutch (to encourage US anglophones
to get used to admitting that they speak 'en:US' or whatever it's called)
though I realise I might have to live with 'en:US'.

Why, you might ask, do I want it in the module namespace ?

So that the contents of the doc string are *all* in the same language:
it'd just be perverse to have an anglophone keyword (Language) as the
one keyword which we don't translate, in doc-strings; and the magic
names in a module's namespace, like the reserved words of the language
(*outside* the doc string) are already condemned to monolinguism, so we
might as well leverage their sacrifice to enable the purity of the doc
strings.

Either that or do something entertaining which involves looking for a
match to:

<keyword meaning `language'>: <language in which that keyword means `language'>

and I'll be immensely impressed if you can make that work.

Of course, *within* the selected language, I'd be more than happy to
watch (if anyone can be bothered to implement) something along the lines
of (with __language__ set to an English variant)

"""blah(burble) -> wibble -- rumbles

... in English ...

Translation:
    Language: French
    ... en Francais ...

    Traduction:			# (perverse but legitimate ...)
        Langue: Allemande
        ... im Deutsch ...	# (no, really, I'm just guessing)

Translation:
    Language: Norse
    ... p&aring; Norsk ...

etc.
"""

in which Translation, Language and the language selected are given in
the host __language__, but the rest of each translation block is in the
guest language (if you see what I mean).  But, as the Norse case
illustrates, how will docstrings cope with encoding languages which need
more than ASCII provides ?  I've defaulted to borrowing HTML's character
entities for this, but I'll bet a Norse author would get swiftly fed up
with doing that ...

However, this is yet more gratuitous over-complete specification ...
We have, collectively, said enough that I'd trust any of the assembled
folk (and, for that matter, any lurkers we may have) to take David's
revised grammar (due some time soon ?) and, whatever they implement,
I'm sure I'll be much happier with it than with what we have now.

	Eddy.
--
Celui qui parle trois langues s'appelle un trilangue.
Celui qui parle deux langues s'appelle un bilangue.
Mais celui qui parle seulement un langue s'appelle un anglophone.
				-- Quebecois joke.