[Doc-SIG] Idea: make double-space between sentences meaningful

David Goodger goodger at python.org
Mon May 17 16:56:07 EDT 2004


Beni Cherniavsky wrote:
 > Some formats (notably LaTeX) support the typographical convention
 > (of some languages, e.g. English but not French IIRC) of putting a
 > bigger space after the end of a sentence than between words.  LaTeX
 > tries to guess intellegently but can fail.  Its guessing can be
 > explicitly overriden [1]_.

If the input text contains double spaces, the output will too, in all
the existing core writers.  For example:

     $ cat ds
     A sentence.  Another.
     New line.

     New paragraph.  Another sentence.
     $ rst2html.py ds
     ...
     <p>A sentence.  Another.
     New line.</p>
     <p>New paragraph.  Another sentence.</p>
     ...

No writer so far does any whitespace normalization [*]_, so whatever
whitespace is in the input (spaces, newlines) comes out as-is in the
output.  LaTeX and any other back-end formatter is free to treat this
whitespace as significant.

.. [*] The parser converts tabs to spaces.

 > Currently, reST provides no way to convey this information to the
 > output format.  Producing high-quality output requires this
 > information.

How should this information be conveyed?  Please provide examples.

 > There already exists an obvious convention supported by programs
 > (e.g. Emacs) for representing it in plain text: just use a double
 > space after the end of a sentence.  I propose to make this official
 > for reStructuredText: more than one space between words after
 > punctuation [2]_ signifies a sentence end [3]_.

It seems to me that Docutils and reStructuredText already support this
standard, simply by not messing with whitespace unduly.  I don't see
how making this "official" would benefit users of reStructuredText.
Do we want the reST parser to be in the business of guessing sentence
endings?

How would we represent it internally?  Please show a doctree
implementing your proposal.

 > It's a good bet that anybody who cares about it in his LaTeX output
 > also cares about his source, but it's a good idea to make this a
 > parser option (defaulting off?)...

How would the parser option behave; what would it do?

 > It is even possible, if desired, to support this in HTML output,
 > using some hack (``&nbsp;`` won't do because we *want* it to be
 > breakable - it's even better there; perhaps ``<span
 > class="sentence-end"> </span>`` with appropriate CSS?).

&nbsp; followed by a regular space works OK.  Something like a
&thinsp; (U+2009) followed by a regular space ought to work, but
doesn't.

-- David Goodger



More information about the Doc-SIG mailing list