[Doc-SIG] backslashing

Edward Welbourne Edward Welbourne <eddy@chaos.org.uk>
Tue, 20 Mar 2001 23:23:33 +0000 (GMT)


> So are you saying we'd have 2 different kinds of literal blocks?  We
> hadn't really discussed that before..  I think that just having
> literals, inlines, and literal blocks is probably enough, but if you
> want to make a case, go ahead. :)

OK, two orthogonal questions about a verbatim fragment:
   * inline or block
   * python code or `alien text'
giving us four kinds of `verbatim' fragment in doc-strings.

As I've been understanding you, '...' is an inline alien and #...# is an
inline python expression.  I've been presuming that there are also
mechanisms for including (and distinguishing between) blocks whose
contents are python or alien in like manner; however, I grant that I've
only seen the '::' marker (unless >>> on each line is the markup for
python, which I won't like, given that it's on each line) used in such a
role, and don't know whether you meant the block it introduces to be
read as alien or python.  If you don't provide for this distinction, I'm
worried.

The python/alien distinction is important, because a python fragment is
worth the renderer scanning for identifiers it knows about, so may wish
to render as xrefs to pertinent documentation (however, this is the
*only* processing the renderer should be doing to it); an alien fragment
*should not* be so scanned; it is `truly verbatim' and any similarity
between sub-texts of it and anything the doc-system knows about *should*
be presumed to be fortuitous - otherwise, we have to put in mechanisms
for enabling the author of the doc-string to, somehow, indicate `no,
really, this *is not* a use of the python identifier which happens to be
spelt the same as it' in a verbatim text, which must abrogate its
verbatimness.

While I recognise that the inline/block distinction would ideally pander
to one's natural desire to have the text flow nicely, I consider this a
layout issue, not a markup one.  [Further: I want to type your fictional
example as::
    If the user types::
        x'
    then the system should print the value of::
        x'(a)
    and return the value of::
        x'(b)
without the blank lines you inserted in it; the '::' on the end of each
text line, and the return to its indentation at the next, should suffice
to enable parsers to know what I meant.  Note that I am treating the
given fragments as alien verbatim; the user is clearly typing at a
prompt which is not a python prompt; and x'(a), x'(b) are presumably
reading x' as `the derivative' of some entity named by x.]

I regard layout-control as a luxury, subordinate to keeping the markup
language simple.  I want to be sure that if something appears in a
verbatim fragment, at least when I'm inside an r"""...""" string, one
can cut-and-paste the fragment into whatever alien context it belongs in
and have it be exactly the right thing; and the formatted output of the
given raw string should display the relevant fragment verbatim.  This
seems more important than being able to inline the tiny proportion
(namely the cases using the inline-delimiter) of the uses one has for
fragments.

> I think of '...' and #...# as used for in-line literals.  I.e., you
> can include them in sentences.  Literal blocks are used for blocks
> that are separated off from the rest of the code.
(final word, `code': I presume you meant `text').
To me, this is a layout distinction, not a semantic one.  Consequently,
> I don't think it's ... resonable to force people to put what really
> *should* be an in-line literal into a literal block.
I don't see how `should' can ever be real here.

As I understand it, at worst one has to put up with `oh dear, this is
going to be ugly, ho hum' rather than `this is going to mean something
different'.  Obviously, block means something different *to the doc
tools* than inline means; but the only meaning I care about is the
information content the reader gets hold of at the end.

> even if we ignore '\', doc writers have to think harder than they
> should if they want to use backslashes in their docs. :)
As long as they use r'...', and don't want to end their string in an odd
number of backslashes, I see no problem: please give an example.
Raw strings are either invalid or read exactly the way they appear.
No thought is required.

>>> r'\'
  File "<stdin>", line 1
    r'\'
       ^
SyntaxError: invalid token
>>> 
so one cannot end a raw string in a single backslash; but
>>> r'\\'
'\\\\'
>>> r'\''
"\\'"
>>> r""" \" """
' \\" '
>>> r""" \' """
" \\' "
>>> r'''\''''
"\\'"
>>> r'\n'
'\\n'
>>> 
anything it doesn't reject, it preserves faithfully.


>>> Not even to save vertical space ;^|
> If it were just an issue of saving vertical space, I'd agree.
sorry, I didn't make myself clear.

Having to add vertical space in these cases is going to annoy *me*,
despite which I would rather endure this annoyance than have the entire
inline mechanism be (IMO) broken.

In the relevant cases, the resulting document, once rendered, will split
things into several paragraphs separated by displays, where I would,
indeed, prefer to have said the same thing in a single paragraph; this
*will* annoy me and will clearly annoy other folk more than the extra
vertical space in the source file, but please understand that for me
it's the other way round and *despite that* I'm arguing for you to
oblige me to break up the text into blocks.  Even if you *do* insist on
me putting back the blank lines I took out of your fictitious example.
(Though I'll be objecting to that independently.)

> We could also discuss ways of indicating that one-line literal blocks
> are "really" inlines (::: or some such), but I'm currently loathe to
> make ST even more complex. :)
the complexity argument is *exactly* the one I'm focussed on.

Out-of-source documentation should provide the means for folk to say
things the way they want and have it displayed the way they want: for
in-code ST docs, simplicity of markup language is *more important* than
making it look nice.  It suffices that we ensure that the author can
express the information the reader needs.  A few rare cases where this
will merely be ugly are not worth extra complexity.

> I assume that "script.write..." is in an r"..." string, 
that was, indeed, my intent; given which,
> You might mean two things, here. 
only meaning 1 is, to my mind, a credible candidate.
Meaning 2 doesn't read my text verbatim.

> you wouldn't normally include the string you gave in a sentence.
OK, so I was trying to be realistic, which made my text long.  How about
(from the docs of an imaginary ftp.py)::
    the send method executes::
	sock.control.write("#")
    after every #chunk# bytes of data have been written to #sock.out#
in which I would normally have wanted to inline the code fragment, but
the presence of a # in it conflicts with the #...# delimiter.  One could
make up shorter examples; but, at least in the present case, one can
side-step the problem::
    the send method writes one '#' character to #sock.control# for every
    #chunk# bytes it writes to #sock.out#.
albeit I may be marking more things with #...# than I need to (am I ?).

Indeed, in the small proportion of situations where I can realistically
believe in needing to escape the delimiter of an inline fragment, I am
inclined to suggest the author think a bit about whether there isn't
some other way of phrasing the text so as to avoid the problem.  [It's a
bit like how `politically correct' folk used to spend time and effort
trying to get us all to settle on a gender-neutral non-neuter pronoun,
but grown-up folk have simply learned to side-step the problem - partly
by reviving the pronoun `one', partly by avoiding constructions which
oblige us to use pronouns in the places where Anglic's provision of them
is unfaithful to the author's intent.]  If no such rephrasing is
possible, the worst we impose on them is that they have to break out
into a block structure; which won't *look* right, but will none the less
express the information they intended to express.

Now, with alien verbatim (as opposed to python verbatim), I realise
there is a problem; alien text can have absolutely anything in it, so
alien fragments using the delimiter can't be relied on to be rare.  Yet,
if we're to serve it up verbatim, we should serve it up verbatim.  If we
can't do it *verbatim*, at least inside `raw' strings, then we should
scrap the verbatim inline mechanism.  The proposed `fix' breaks its
verbatimness, i.e. fixes one thing while breaking another.
That's not an acceptable fix.

	Eddy.