Marking translatable strings

François Pinard pinard at iro.umontreal.ca
Sat Sep 18 20:31:08 EDT 1999


andy at robanal.demon.co.uk (Andy Robinson) écrit:

> >So, I want the big picture right now.  That is: a technique for
> >marking strings for automatic extraction and building of PO files,
> >and a technique for using PO files from within Python scripts.

> Big picture: are you looking for run-time translation, or 'build-time'?

Run time, of course.  This is only at run time that a program knows which
language the user wants.

However, before this happens, translatable strings must be extracted,
distributed to national teams which orchestrate the work with translators,
and the resulting translation files made available (through the usual
channels), automatically validated, canonicalised, cleared with the team,
and checked for legalese (some packages requiring some, especially when
the FSF is involved).  The big picture is to get all this into movement.
It goes beyond "just" adding internationalisation mechanics in programs.

> i.e. should a Python application be able to switch languages in mid-run
> when a user hits a menu item?

Yes.  This almost never occurs in the cases I observed until now, but at
least in theory, all the mechanics is already in place within `gettext'.

> Also do you want an automatic and theoretically perfect tool to understand
> every context in which quotes occur and do the right thing, or a practical
> one to reduce the cost of internationalizing apps by 90%.

I want something not far from perfect, but it cannot be fully automatic.
Which strings are translatable or not are ultimately decided by a human,
helped by heuristics so string marking is done faster.  Translations are
fully done by humans (automatic translation is not quite successful).

> [...] so I could use the English messages as the key, but then hit a
> situation where a simple English message was used twice in two different
> contexts.

This is the usual problem with this approach, yet the problem occurs
rarely in practice.  It is often (but not always) resolved easily by
using longer strings, instead of building strings with translated inserts
(and moreover, this is more painful for translators, languages have
widely varying syntaxes).  When this happens, the simplest is to fake
two different strings for the speciality cases, and provide an English
translation file containing only the problematic strings to recover the
ambiguous English writing.  Ulrich has other plans for making this more
transparent, but we are not fully decided yet, this is not implemented.

The other approach is using numbers, or identifiers for each string.
It does not have the problem you mention, but it has other problems, and
in my own opinion, adding everything, this is worse overall, on average.
But some people are proponent of this one, and religious fights quite
easily pop up between the two approaches.  People are extremely touchy when
their own language is involved.  Some English speaking people, strangely,
are prone to think they understand all about others' linguistic problems :-).

> In Python, I'd be tempted to have a special tag to use at the
> beginning of the string I wanted internationalized, say
> "@@@USER_FORGOT_THE_FLOPPY".

Having the tag within the string has the disadvantage that even English
requires processing, and that the overall program looses in legibility, which
is a serious drawback.  That's why I'm looking for tagging methods leaving
the original string intact, and if possible, the language intact as well.

> Then I'd write a script to walk through the project finding these,
> and on each occurrence (a) replace it with a function to do a run-time
> lookup into a constants database, or a constant in a module, and (b)
> add the string to my database or the source of my constants module.

It is better to aim managing translations on a wider scale, for a lot
of packages and languages, with common tools and specialised teams.
This is the purpose of the PO file format, and the Translation Project.
If each language, or even each package, was seeking its own implementation,
it makes things more complex overall, in the big picture.  On the other
hand, many years in that area taught me that it is fairly difficult to
get everybody to agree and collaborate.  Nowadays, with most maintainers,
everything goes rather easily, while only a few are ready to argue endlessly.

At the beginning, it was really, really exhausting: nobody was agreeing to
nothing, and we had strong interference with negative effects.  We had to
start somewhere, we could just not aim everything at once.  This is easier
on me now, because the project is well started, and is slowly growing.
Some initial choices are set and most people collaborate at following them.
As things will get more solid and universal, we will dare further steps.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard





More information about the Python-list mailing list