[Doc-SIG] Hello and feedback

David Goodger goodger@python.org
Sat, 15 Mar 2003 10:55:09 -0500


[David Priest]
> I'm feeling that I've perhaps pissed in someone's cornflakes, but
> I'm going to respond anyways.  If I've offended, I apologize
> profusely: no offense was intended.

No offense perceived or taken.  We're just debating the technical
issues; nothing personal.  Sorry if it seemed too blunt, but email
tends to look that way.  I don't have the time to be super-friendly or
the inclination to sprinkle smilies throughout.  When discussing
issues via email, one needs a thick skin.  Assume that the writer is
smiling continuously, trying to help, which I was and am.

[David Goodger]
>> XML character entities (&xNNNN;) are unknown to reStructuredText and
>> are the wrong way to do it.  Docutils is correct in substituting
>> "&" in HTML output for every "&" in the input file (how could it
>> tell the difference between *using* an XML character entity and just
>> *talking* about one?).

>  The charent pattern can be detected easily enough, and the "&" encoding
> skipped for those entities.  If you want to talk about an entity directly,
> literalizing it would do the trick.  In all other cases the
> ampersand can be safely encoded.

By literalizing do you mean ``inline literals`` or literal blocks?
That's not always acceptable.  I might want to say

    The '&' entity is used by HTML and XML to
    represent the '&' character.

I shouldn't have to use inline literals here.

Docutils uses Unicode internally, and I don't see a need for it to
grow a character entity subsystem.  So far, you're the only one who
has asked for one, and that's not convincing enough.  I suspect that
you may be asking Docutils to cover a deficiency in your toolset, or
there's a misunderstanding.  Please answer these questions from my
last message to help clear this up:

    Why are you unable to insert the actual, encoded characters into
    the text?  What *are* you able to insert?  What encoding are your
    files using?  What platform (OS, editor, etc.)?  It could be that
    you *can* insert real characters but don't know it.

> But the substitution table using "proper" characters is good enough,
> although I'm not entirely sure that all backends output will be able
> to deal with two-byte Unicode.

HTML can handle UTF-8.  XML uses Unicode internally and assumes UTF-8
or UTF-16 unless told otherwise.  As for back-ends, that's a Writer
issue.  If output format X can't handle Unicode, then the X format's
Writer needs to encode those characters or signal an error.  TeX can't
handle the &#xNNNN; form.

Here's an alternative for you.  If you want to use &whatever; XML
character entities in your source, just put a simple filter into your
tool chain that converts those entities into UTF-8.  Something like::

    charents2utf8 input.txt | docutils/tools/html.py > output.html

There must be filters like that "out there".  If not, it wouldn't be
hard to write one, I think.  A codec would do just as well, but Python
doesn't come with such a codec (more's the pity).

I just realized the front-ends don't have support for explicit
stdin/stdout with "-" arguments.  I'll add that soon.

[re: interpreted text roles]
>> There's absolutely no need for any regex substitution here.  That's
>> exactly what the Writers are for.  The ":gui:`File`" input may
>> become "<gui>File</gui>" in the internal document tree.  The HTML
>> Writer would write it with bold (or, better yet, with <span
>> class="gui">, made bold by the stylesheet).  The DocBook Writer
>> would write it with <guilabel>.

> The problem with implementing it as a Writer is that Writers don't
> travel with the source text files.  If I send you a ReST file with
> :gui: roles in it, what's your DocUtils installation going to do
> with it?

The set of roles built into Docutils itself will grow.  If the growth
proves to be unlimited or unmanageable, there will have to be an
alternative.  If those roles are not handled by the default Docutils
set, then they can be local to your installation.  They wouldn't be
portable, true; there's only so far a "standard" can go and we can't
please everybody 100%.

There's also this alternative:

>> There has been some discussion about parameterizing the interpreted
>> text system somehow, to avoid proliferation of element types (gui,
>> keypress, etc.).  No decision or action yet.

See the Doc-SIG thread, "master plan for interpreted text?" from last
month.

> If "parameterizing the interpreted text system" means that simple
> role substitutions -- the kind that can be handled by regex -- can
> be placed within the source text files, great!  It makes the source
> text more portable.

But I doubt it will take the form of "regex substitutions".  That's
just too low-level, IMHO.

> If the role can't be handled by a regex, then of course it's going
> to require a Writer.  (Although... if one could embed a Python
> script... but, no.  That's verging on silly.)

Have you read PEP 256 & 258 yet?  Please do.  They explain the
Docutils architecture and the purpose of the components (Writer,
Reader, Parser, etc.).
 
> And if this email has been scrunched into a blob, I apologize.
 
Try leaving a space on each blank line.
I.e., [return] [space] [return].  I've done that just above.

-- David Goodger    http://starship.python.net/~goodger

Programmer/sysadmin for hire: http://starship.python.net/~goodger/cv