[Python-ideas] PEP: Python Documentation Translations

Julien Palard julien at palard.fr
Tue Mar 21 18:05:31 EDT 2017


Hi,

Here is the follow-up to the "Translated Python documentation" thread on python-dev [1]_, as a PEP describing the steps to make Python Documentation Translation official on accessible on docs.python.org.

.. [1] [Python-Dev] Translated Python documentation
(https://mail.python.org/pipermail/python-dev/2017-February/147416.html)

You can also read it online at https://github.com/JulienPalard/peps/blob/master/pep-0xxx.rst

==============================
PEP: XXX
Title: Python Documentation Translations
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <vstinner at redhat.com>,
Inada Naoki <songofacandy at gmail.com>,
Julien Palard <julien at palard.fr>
Status: Draft
Type: Process
Created: 04-Mar-2017
Post-History:

Abstract
========

The intent of this PEP is to make existing translations of the Python
Documentation more accessible and discoverable. By doing so,
attracting and motivating new translators and new translations.

Translated documentation will be hosted on python.org. Examples of
two active translation teams:

* http://docs.python.org/fr/: French
* http://docs.python.org/jp/: Japanese

http://docs.python.org/en/ will redirect to http://docs.python.org/.

Sources of translated documentation will be hosted in the Python
Documentation organization on GitHub: https://github.com/python-docs/.
Contributors will have to sign the Python Contributor Agreement (CLA)
and the license will be the PSF License.

Motivation
==========

On the french ``#python-fr`` IRC channel on freenode, it's not rare to
meet people who don't speak english and so are unable to read the
Python official documentation. Python wants to be widely available,
to all users, in any language: that's also why Python 3 now allows
any non-ASCII identifiers:
https://www.python.org/dev/peps/pep-3131/#rationale

There are a least 3 groups of people who are translating the Python
documentation in their mother language (french [16]_ [17]_ [18]_,
japanese [19]_ [20]_, spanish [21]_), even though their translation
are not visible on d.p.o. Other less visible and less organized
groups are also translating in their mother language, we heard of
Russian, Chinese, Korean, maybe some others we didn't found yet. This
PEP defines rules to move translations on docs.python.org so they can
easily be found by developers, newcomers and potential translators.

The Japanese team currently (March 2017) translated ~80% of the
documentation, french team ~20%. French translation went from 6% to
23% in 2016 [13]_ with 7 contributors [14]_, proving a translation
team can be faster than documentation mutates.

Quoting Xiang Zhang about Chinese translations:

I have seen several groups trying to translate part of our official
doc. But their efforts are disperse and quickly become lost because
they are not organized to work towards a single common result and
their results are hold anywhere on the Web and hard to find. An
official one could help ease the pain.

Rationale
=========

Translation
-----------

Issue tracker
'''''''''''''

Considering that issues opened about translations may be written in
the translation language, which can be considered noise but at least
is inconsistent, issues should be placed outside `bugs.python.org
<https://bugs.python.org/>`_ (b.p.o).

As all translation must have their own github project (see `Repository
for Po Files`_), they must use the associated github issue tracker.

Considering the noise induced by translation issues redacted in any
languages which may beyond every warnings land in b.p.o, triage will
have to be done. Considering that translations already exist and are
not actually a source of noise in b.p.o, an unmanageable amount of
work is not to be expected. Considering that Xiang Zhang and Victor
Stinner are already triaging, and Julien Palard is willing to help on
this task, noise on b.p.o is not to be expected.

Also, language team coordinators (see `Language Team`_) should help
triaging b.p.o by properly indicating, in the issue author language if
needed, the right issue tracker.

Branches
''''''''

Translation teams should focus on last stable versions, and use tools
(scripts, translation memory, …) to automatically translate what is
done in one branch to other branches.

.. note::
Translation memories are a kind of database of previously translated
paragraphs, even removed ones. See also `Sphinx Internationalization
<http://www.sphinx-doc.org/en/stable/intl.html>`_.

The three stable branches will be translated [12]_: 2.7, 3.5, and 3.6.
The scripts to build the documentation of older branches have to be
modified to support translation [12]_, whereas these branches now only
accept security-only fixes.

The development branch (master) should have a lower translation priority
than stable branches. But docsbuild-scripts should build it anyway so
it is possible for a team to work on it to be ready for the next
release.

Hosting
-------

Domain Name, Content negociation and URL
''''''''''''''''''''''''''''''''''''''''

Different translation can be told appart by changing one of:
Country Code Top Level Domain (CCTLD),
path segment, subdomain, or by content negociation.

Buying a CCTLD for each translations is expensive, time-consuming, and
sometimes almost impossible when already registered, this solution
should be avoided.

Using subdomains like "es.docs.python.org" or "docs.es.python.org" is
possible but confusing ("is it `es.docs.python.org` or `docs.es.python.org`?").
Hyphens in subdomains like
`pt-br.doc.python.org` in uncommon and SEOMoz [23]_ correlated the presence of
hyphens as a negative factor. Usage of underscores in subdomain is
prohibited by the RFC1123 [24]_, section 2.1. Finally using subdomains
means creating TLS certificates for each languages, which is more
maintenance, and will probably causes us troubles in language pickers
if, like for version picker, we want a preflight to check if the
translation exists in the given version: preflight will probably be
blocked by same-origin-policy. Wildcard TLS certificates are very
expensive.

Using content negociation (HTTP headers ``Accept-Language`` in the
request and ``Vary: Accept-Language``) leads to a bad user experience
where they can't easily change the language. According to Mozilla:
"This header is a hint to be used when the server has no way of
determining the language via another way, like a specific URL, that is
controlled by an explicit user decision." [25]_. As we want to be
able to easily change the language, we should not use the content
negociation as a main language determination, so we need somthing
else.

Last solution is to use the URL path, which looks readable, allows
for an easy switch from a language to another, and nicely accepts
hyphens. Typically something like: "docs.python.org/de/". Example
with a hyphen: "docs.python.org/pt-BR/"

As for version, sphinx-doc does not support compiling for multiple
languages, so we'll have full builds rooted under a path, exactly like
we're already doing with versions.

So we can have "docs.python.org/de/3.6/" or
"docs.python.org/3.6/de/". Question is "Does the language contains
multiple version or does version contains multiple languages?" As
versions exists in any cases, and translations for a given version may
or may not exists, we may prefer "docs.python.org/3.6/de/", but doing
so scatter languages everywhere. Having "/de/3.6/" is clearer about
"everything under /de/ is written in deutch". Having the version at
the end is also an habit taken by readers of the documentation: they
like to easily change the version by changing the end of the path.

So we should use the following pattern:
"docs.python.org/LANGUAGE_TAG/VERSION/".

Current documentation is not moved to "/en/", but "docs.python.org/en/"
will redirect to "docs.python.org/en/".

Language Tag
''''''''''''

A common notation for language tags is the IETF Language Tag [3]_
[4]_ based on ISO 639, alghough gettext uses ISO 639 tags with
underscores (ex: ``pt_BR``) instead of dashes to join tags [5]_
(ex: ``pt-BR``). Examples of IETF Language Tags: ``fr`` (French),
``jp`` (Japanese), ``pt-BR`` (Orthographic formulation of 1943 -
Official in Brazil).

It is more common to see dashes instead of underscores in URLs [6]_,
so we should use IETF language tags, even if sphinx uses gettext
internally: URLs are not meant to leak the underlying implementation.

It's uncommon to see capitalized letters in URLs, and docs.python.org
don't use any, so it may hurt readability by attracting the eye on it,
like in: "https://docs.python.org/pt-BR/3.6/library/stdtypes.html".
RFC 5646 (Tags for Identifying Languages (IETF)) section-2.1 [7]_
tells the tags are not case sensitive. As the RFC allows lower case,
and it enhances readability, we should use lowercased tags like
``pt-br``.

It's redundant to display both language and country code if they're
the same, typically "de-DE", "fr-FR", although it make sense,
respectively "Deutch as spoken in Germany" and "French as spoken in
France", it's not a usefull information for the reader. So we may drop
those redundencies. We should obviously keep the country part when it
make sense like "pt-BR" for "Portuguese as spoken in Brazil".

So we should use IETF language tags, lowercased, like ``/fr/``,
``/pt-br/``, ``/de/`` and so on.

Fetching And Building Translations
''''''''''''''''''''''''''''''''''

Currently docsbuild-scripts are building the documentation [8]_. These scripts
should be modified to fetch and build translations.

Building new translations is like building new versions, so we're
adding complexity, but not that much.

Two steps should be configurable distinctively: Build a new language,
and add it to the language picker. This allows a transition step
between "we accepted the language" and "it is translated enough to be
made public", during this step, translators can review their
modifications on d.p.o without having to build the documentation
locally.

From the translations repositories, only the ``.po`` files should be
opened by the docsbuild-script to keep the attack surface and probable
bugs sources at a minimum. This mean no translation can patch sphinx
to advertise their translation tool. (This specific feature should be
handled by sphinx anyway [9]_).

Community
---------

Mailing List
''''''''''''

The `doc-sig`_ mailing list will be used to discuss cross-language
changes on translated documentations.

There is also the i18n-sig list but it's more oriented towards i18n APIs
[1]_, than translation the Python documentation.

.. _i18n-sig: https://mail.python.org/mailman/listinfo/i18n-sig
.. _doc-sig: https://mail.python.org/mailman/listinfo/doc-sig

Chat
''''

Python community being highly active on IRC, we should create a new
IRC channel on freenode, typically #python-doc for consistency with
the mailing list name.

Each language coordinator can organize its own team, even by choosing
another chat system if the local usage asks for it. As local teams
will write in their native languages, we don't want each team in a
single channel, and it's also natural for the local teams to reuse
their local channels like "#python-fr" for french translators.

Repository for PO Files
'''''''''''''''''''''''

Considering that each translation teams may want to use different
translation tools, and that those tools should easily be synchronized
with git, all translations should expose their ``.po`` files via a git
repository.

Considering that each translation will be exposed via git
repositories, and that Python has migrated to GitHub, translations
will be hosted on github.

For consistency and discoverability, all translations should be in the
same github organization and named according to a common pattern.

Considering that we want translations to be official, and that Python
already have a github organization, translations should be hosted as
projects of the `Python documentation GitHub organization`_.

For consistency, translations repositories should be called
``python-docs-LANGUAGE_TAG`` [22]_.

The docsbuild-scripts may enforce this rule by refusing to fetch
outside of the Python organization or a wrongly named repository.

The CLA bot may be used on the translation repositories, but with a
limited effect as local coordinators may synchronize themselves
translations from an external tool like transifex, loosing in the
process who translated what.

Version can be hosted on different repositories, different directories
or different branches. Storing them on different repositories will
probably pollute the Python documentation github organization. As it
is typical and natural to use branches to separate versions, branches
should be used to do so.

.. _Python documentation GitHub organization: https://github.com/python-docs/

Translation tools
'''''''''''''''''

Most of the translation work is actually done on Transifex [15]_.

Other tools may be used later https://pontoon.mozilla.org/
and http://zanata.org/

Contributor Agreement
'''''''''''''''''''''

Contributions to translated documentation will be requested to sign the
Python Contributor Agreement (CLA):

https://www.python.org/psf/contrib/contrib-form/

Language Team
'''''''''''''

Each language team should have one coordinator responsible to:

- Manage the team
- Choose and manage the tools its team will use (chat, mailing list, …)
- Ensure contributors understand and agree with the CLA
- Ensure quality (grammar, vocabulary, consistency, filtering spam, ads, …)
- Do redirect to GitHub issue tracker issues related to its
language on bugs.python.org

The license will be the `PSF License <https://docs.python.org/3/license.html>`_,
and copyright should be transferable to PSF later.

Alternatives
------------

Simplified English
''''''''''''''''''

It would be possible to introduce a "simplified english" version like
wikipedia did [10]_, as discussed on python-dev [11]_, targetting
english learners and childrens.

Pros: It yields a single other translation, theorically readable by
everyone, and reviewable by current maintainers.

Cons: Subtle details may be lost, and translators from english to english
may be hard to find as stated by Wikipedia:

> The main English Wikipedia has 5 million articles, written by nearly
140K active users; the Swedish Wikipedia is almost as big, 3M articles
from only 3K active users; but the Simple English Wikipedia has just
123K articles and 871 active users. That's fewer articles than
Esperanto!

Changes
=======

Migrate GitHub Repositories
---------------------------

We (authors of this PEP) already own french and japanese Git
repositories, so moving them to the Python documentation organization will not be a
problem. We'll however follow the `New Translation Procedure`_.

Patch docsbuild-scripts to Compile Translations
-----------------------------------------------

Docsbuild-script must be patched to:

- List the languages tags to build along with the branches to build.
- List the languages tags to display in the language picker.
- Find translation repositories by formatting
``github.com:python-docs/python-docs-{language_tag}.git`` (See
`Repository for Po Files`_)
- Build translations for each branches and each languages

Patched docsbuild-scripts must only open ``.po`` files from
translation repositories.

List coordinators in the devguide
---------------------------------

Add a page or a section with an empty list of coordinators to the
devguide, each new coordinators will be added to this list.

Create sphinx-doc Language Picker
---------------------------------

Highly similar to the version picker, a language picker must be
implemented. This language picker must be configurable to hide or
show a given language.

Enhance rendering of untranslated fuzzy translations
----------------------------------------------------

It's an opened sphinx issue [9]_, but we'll need it so we'll have to
work on it. Translated, fuzzy, and untranslated paragraphs should be
differentiated. (Fuzzy paragraphs have to warn the reader what he's
reading may be out of date.)

New Translation Procedure
=========================

Designate a Coordinator
-----------------------

The first step is to designate a coordinator, see `Language Team`_,
The coordinator must sign the CLA.

The coordinator should be added to a list of translation coordinator
on the devguide.

Create github repository
------------------------

Create a repository named "python-docs-{LANGUAGE_TAG}" on the Python
documentation github organization (See `Repository For Po Files`_.), and grant the
language coordinator push rights to this repository.

Add support for translations in docsbuild-scripts
-------------------------------------------------

As soon as the translation hits its firsts commits, update the
docsbuild-scripts configuration to build the translation (but not
displaying it in the language picker).

Add translation to the language picker
--------------------------------------

As soon as the translation hits:

- 100% of bugs.html with proper links to the language repository
issue tracker.
- 100% of tutorial
- 100% of library/functions (builtins)

the translation can be added to the language picker.

Previous discussions
====================

- `[Python-ideas] Cross link documentation translations (January, 2016)`_
- `[Python-ideas] Cross link documentation translations (January, 2016)`_
- `[Python-ideas] https://docs.python.org/fr/ ? (March 2016)`_

.. _[Python-ideas] Cross link documentation translations (January, 2016):
https://mail.python.org/pipermail/python-ideas/2016-January/038010.html

.. _[Python-Dev] Translated Python documentation (Febrary 2016):
https://mail.python.org/pipermail/python-dev/2017-February/147416.html

.. _[Python-ideas] https://docs.python.org/fr/ ? (March 2016):
https://mail.python.org/pipermail/python-ideas/2016-March/038879.html

References
==========

.. [1] [I18n-sig] Hello Python members, Do you have any idea about
Python documents?
(https://mail.python.org/pipermail/i18n-sig/2013-September/002130.html)

.. [2] [Doc-SIG] Localization of Python docs
(https://mail.python.org/pipermail/doc-sig/2013-September/003948.html)

.. [3] Tags for Identifying Languages
(http://tools.ietf.org/html/rfc5646)

.. [4] IETF language tag
(https://en.wikipedia.org/wiki/IETF_language_tag)

.. [5] GNU Gettext manual, section 2.3.1: Locale Names
(https://www.gnu.org/software/gettext/manual/html_node/Locale-Names.html)

.. [6] Semantic URL: Slug
(https://en.wikipedia.org/wiki/Semantic_URL#Slug)

.. [7] Tags for Identifying Languages: Formatting of Language Tags
(https://tools.ietf.org/html/rfc5646#section-2.1.1)

.. [8] Docsbuild-scripts github repository
(https://github.com/python/docsbuild-scripts/)

.. [9] i18n: Highlight untranslated paragraphs
(https://github.com/sphinx-doc/sphinx/issues/1246)

.. [10] Wikipedia: Simple English
(https://simple.wikipedia.org/wiki/Main_Page)

.. [11] Python-dev discussion about simplified english
(https://mail.python.org/pipermail/python-dev/2017-February/147446.html)

.. [12] Passing options to sphinx from Doc/Makefile
(https://github.com/python/cpython/commit/57acb82d275ace9d9d854b156611e641f68e9e7c)

.. [13] French translation progression
(https://mdk.fr/pycon2016/#/11)

.. [14] French translation contributors
(https://github.com/AFPy/python_doc_fr/graphs/contributors?from=2016-01-01&to=2016-12-31&type=c)

.. [15] Python-doc on Transifex
(https://www.transifex.com/python-doc/)

.. [16] French translation
(https://www.afpy.org/doc/python/)

.. [17] French translation github
(https://github.com/AFPy/python_doc_fr)

.. [18] French mailing list
(http://lists.afpy.org/mailman/listinfo/traductions)

.. [19] Japanese translation
(http://docs.python.jp/3/)

.. [20] Japanese github
(https://github.com/python-doc-ja/python-doc-ja)

.. [21] Spanish translation
(http://docs.python.org.ar/tutorial/3/index.html)

.. [22] [Python-Dev] Translated Python documentation: doc vs docs
(https://mail.python.org/pipermail/python-dev/2017-February/147472.html)

.. [23] Domains - SEO Best Practices | Moz
(https://moz.com/learn/seo/domain)

.. [24] Requirements for Internet Hosts -- Application and Support
(https://www.ietf.org/rfc/rfc1123.txt)

.. [25] Accept-Language
(https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language)

Copyright
=========

This document has been placed in the public domain.

..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

--
Julien Palard
https://mdk.fr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170321/702b87b0/attachment-0001.html>


More information about the Python-ideas mailing list