[Doc-SIG] Re: support for translations in reStructuredText

David Goodger goodger@users.sourceforge.net
Sun, 30 Dec 2001 15:40:58 -0500


thomas@co-buero.de recently wrote to me:
> Hello,
>=20
> I'm thinking about multilingual texts, where the source for every languag=
e
> will be in one source file. I would like to know, wether
> there exist some type of
> markup in reStructuredText, which could support something like this (whic=
h I
> would love to have for example in MoinMoin to support
> multilanguage Wikis).
>=20
> To explain, what I mean, take a look at this example:
>=20
> -----------
>=20
> :lang:en
>=20
> Headline
> {de}=DCberschrift
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>=20
> This is the first line of the first paragraph.
> {de} Dies ist die erste Zeile des ersten Paragraphen.
> {sp} ....
>=20
> {de} Diese Zeile gibt es nur auf Deutsch
>=20
> {en} This line exists only in english language.
>=20
> ----
>=20
> For output this text should be rendered according to user preferences,
> showing only the language best matching the users skills.
>=20
>=20
> I'm sorry in case this is not the right place to discuss this topic and i=
n
> this case would be glad to know about the right one.
> Thank You,
> Thomas Kalka

Hi Thomas,

Doc-SIG is the best place to discuss this topic.

I don't think anybody has addressed multilingual texts in reStructuredText
yet, so the field is wide open. I don't see any reason why it cannot be
done, with a little bit of work.

When I worked for an SGML data processing company in Tokyo, I dealt with
multilingual documents. We would receive a document in one language
(typically English) and prepare it for translation (by professional
translators) into one or more other languages. The solution we came up with
involved marked sections. I wrote a program (in my Perl days) which was fed
the document and a description of the translation unit granularity
(typically, paragraph-equivalent elements).

Given an original text like::

    <p>This is a paragraph in <i>English</i></p>

The program would produce something like this (may be syntax errors)::

    <p>
    <![ %english; [
    This is a paragraph in <i>English</i>
    ]]>
    <![ %chinese; [
    <i></i>
    ]]>
    </p>

The translators would fill in the "%chinese;" section, using & rearranging
any inline tags present. When processing, we would set the "english"
parameter entity to "INCLUDE" to get the English version, or set "chinese"
to "INCLUDE" to get the Chinese version (default for parameter entities was
"IGNORE"). This data-level approach was moderately successful. There wasn't
time or budget for any kind of tool-level support.

I don't know what the state of the art in multilingual applications of
XML/SGML is now, but I could imagine a similar approach for a
reStructuredText document. A directive would do the trick::

    .. language:: en

    This section of the document is in English.

    .. language:: fr

    Ceci est en francais.

A bunch of pre-defined language directives would also do fine: ".. en::",
".. fr::", ".. de::", ".. jp::", etc. These directives would act on all
following text up to the next such directive: simply keep the text if it's
the desired language, or discard it if not.

You would have to choose the granularity level to suit your data &
application. I think a paragraph-level translation unit is appropriate for
most cases. I wouldn't want to interleave sentences of different languages
in a single paragraph: too confusing. Section-level granularity might be
better for some documents, or document-level granularity (the entire Englis=
h
version, followed by the entire German version, etc.).

Some kind of editor-level support could also be very useful. Perhaps a mino=
r
mode for the hypothetical emacs reStructuredText mode?

There hasn't been any progress with reStructuredText/DPS/DocUtils of late,
due to a convergence of factors (busy at work; busy at home; holidays). But
I do intend to get back to them soon. The projects are hibernating
temporarily, and will be revived shortly.

--=20
David Goodger    goodger@users.sourceforge.net    Open-source projects:
 - Python Docstring Processing System: http://docstring.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net
 - The Go Tools Project: http://gotools.sourceforge.net