Marking translatable strings

François Pinard pinard at iro.umontreal.ca
Fri Oct 8 09:55:03 EDT 1999


"Fredrik Lundh" <fredrik at pythonware.com> writes:

> > What about a t'translated string literal', like r'raw string'?

> is it really that hard to write, say,
> T('translated string literal') ?

Nothing is really hard to write.  But since translatable strings are meant
to be rather ubiquitous in Python programs, the real point is making as
much easy to read as possible, that is, quite legible and very unobtrusive.

I know it looks very nit-picky, but since this is meant to become just
part of the life of those caring about internationalisation (quite a few
by now), any slight increase in legibility will be a big win over time.
For example, comparing:

  T('translated string literal')
  _('translated string literal')

it seems to me that the second form is less cluttered than the first.

But the main problem with the writing above is that is has to mean something
which gets executed, we consider that `T' or `_' is some translating
function.  In practice, it significantly occurs that we need to mark
strings as translatable, without translating them right away, sometimes
in contexts where a function call would not be syntactically correct.

So, even if we can use special function calls as extra hints to "mark"
their string argument, there are other contexts where the strings themselves
need to be "marked" more intimately within the Python language.  How the
interpreter might be involved, or not, into dynamically translating such
strings, without using explicit translating functions like `T' or `_',
might be the subject of thought and debates, as they are many approaches for
doing it.  Besides, none would work anyway if Guido does not get interested.


On a related, but different topic, there is something which is little known,
but that might be of interest to those who would like to read `.mo' files
(compiled PO files) directly in Python, without going through `gettext'.
When we designed the `.mo' file format for GNU, I insisted that `msgfmt'
sorts the strings in lexicographical order of `msgid' (untranslated) strings,
even if `gettext' does not need it.  The goal was to help scripting languages
that could not easily use the precise double hashing algorithm of `gettext',
by making possible to just binary search in the compiled file.

Also worth noting and little known, Ulrich has released a snapshot of
`gettext' which is not encumbered by the GPL, for the Danish UUG or Danish
national standardisation groups, I do not remember exactly.  The precise
double hashing is part of this special release, and the algorithm did not
change since, so far that I know.  This could be useful if Python proper
was to provide string translation services independently of `gettext'.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard





More information about the Python-list mailing list