rst and pypandoc

Mon Mar 2 10:09:17 EST 2015

alb wrote:

> Hi Steven,
> 
> Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
> []
>> Since \r is an escape character, that will give you carriage return
>> followed by "ef{fig:abc".
>> 
>> The solution to that is to either escape the backslash:
>> 
>> i = '\\ref{fig:abc}'
>> 
>> 
>> or use a raw string:
>> 
>> i = r'\\ref{fig:abc}'

Dave has corrected my typo in the above: it should be r'\ref', the whole
point of raw strings is that you don't need to escape the backslashes.

> ok, maybe I wasn't clear from the very beginning, but searching for a
> solution is a journey that takes time and patience.
> 
> The worngly named variable i (as noted below), contains the *i*nput of
> my text which is supposed to be restructured text. The output is what
> pypandoc spits out after conversion:

Ah, well that's not a bad convention for small utility functions, but I
wouldn't want single-letter names to be used in anything bigger than, say,
a dozen lines. Having i for input and o for output right next to each other
helps too. But you're still swimming against the convention that i means an
integer. Whether you decide it is worth going against that convention in
your own code is up to you, but when asking for help, it is worth your
while to be the least surprising or different as you can manage.

> i = "\\begin{tag}{%s}{%s}\n %s\n \\end{tag}" % (some, restructured, text)
> o = pypandoc.convert(i, 'latex', format='rst')
> 
> Now if i contains some inline text, i.e. text I do not want to convert
> in any other format, I need my text to be formatted accordingly in order
> to inject some escape symbols in i.
> 
> Rst escapes with "\", but unfortunately python also uses "\" for escaping!

Yes, but only in string literals. In Python source code, "\r" makes a
carriage return, but when reading from the keyboard (say, using the
raw_input function), from a file, or anything other than a string literal,
a string consisting of "\r" is just backslash-r.

So, worst case, you can always assemble your strings like this:

backslash = chr(92)
i = (backslash + "begin{tag}{%s}{%s}\n %s\n " + backslash + "end{tag}" 
        % (some, restructured, text))

although that is a PITA.

I recommend using raw triple strings, and avoid needing \n escapes:

i = r"""\begin{tag}{%s}{%s}
 %s
 \end{tag}""" % (some, restructured, text)

>> Can you show what you are doing? Escaping the backslash with another
>> backslash does work:
>> 
>> py> for c in '\\ref':
>> ...     print(c, ord(c))
>> ...
>> \ 92
>> r 114
>> e 101
>> f 102
>> 
>> so either you are doing something wrong, or the error lies elsewhere.
> 
> As said above, the string is converted by pandoc first and then printed.
> At this point the escaping becomes tricky (at least to me).
> 
> In [17]: inp = '\\ref{fig:abc}'

If you print inp at this point, you should see that it contains exactly what
you expect: backslash, R E F etc.

> In [18]: print pypandoc.convert(inp, 'latex', format='rst')
> ref\{fig:abc\}

and now the backslash is gone, and the braces are escaped. This suggests
that the problems lies with pypandoc. Perhaps you need to add extra
backslashes, so that pypandoc will convert a double-backslash to a single
one. Consult your pypandoc documentation, and try this:

inp = '\\\\ref{fig:abc}'  # That's FOUR backslashes, to get \\

# or as a raw-string:

inp = '\\ref{fig:abc}'
assert inp[0] == inp[1] == chr(92)
out = pypandoc.convert(inp, 'latex', format='rst') 
print out, out == r"\ref\{fig:abc\}"

-- 
Steven