Marking translatable strings

François Pinard pinard at iro.umontreal.ca
Fri Oct 8 21:09:28 EDT 1999


Bernhard Herzog <herzog at online.de> écrit:

> The only place I can think of where Python's syntax forbids a function
> call but allows a string literal is a doc-string.

My impression as well.

> The only reason not to use a function for marking strings as translatable
> without translating them is probably performance, but the overhead for
> such a solution would be negligible in most cases, I guess.

Oh, you have to be careful with that.  We discovered rather soon, in the
preliminary discussions, a while before `gettext' existed, that some people
(usually English speaking) are extremely sensitive to the loss of performance
that might be induced by internationalisation.  This is why there are so
many tricks, both within `gettext' and Makefile engineering, to either get
speed or just opt out.  Now, it seems that the idea of internationalisation
made its way among programmers, enough that we have much less reluctance than
we used to.  But still, we should probably stay careful about performance.

One important point in that matter is to lazy-translate as much as we can,
to have better program start time.  PO files usually are installed all
pre-hashed (compiled) so the load time could be reduced to about what
it takes to find the proper file, plus the proper `mmap' system call.
Files are not loaded until the time a translation is effectively needed --
as there are many cases where programs just do output anything at all,
or nothing needing a translation.  Under GNU C, each textual `gettext'
call ensures a particular string is hashed only once, the translation is
then immediately cached into a variable allocated by the macro, and reused
directly afterwards.

More than once, I saw applications having preset structures holding a lot of
translatable strings, yet very few of all these strings will be effectively
used to build some output while the program runs.  If we use a Python
function call in those positions, the program will proceed to translating
_all_ strings while initialising the structure, and this would mean an
appreciable slowdown, that I would be tempted to consider unacceptable.

Loading all strings from a PO file (compiled or not) at once, and
establishing a Python dictionary for them at program startup, might also be
a costly operation, that would be prohibitive in big applications.  This is
surely a simple way to prototype and experiment around internationalisation,
but it should not stay in the long run.  Lazy processing of PO files would
be a much more attractive avenue for Python, as it is for other languages.

> Such existing code only makes a difference to the program which extracts
> translatable strings as it might end up putting strings into po-files
> that aren't actually meant to be translated and will never be
> translated.

Indeed.  We already have a few PO files having more than one thousand
entries.  Some experimental extractions I did show me that Emacs would
require many dozens of thousands entries.  Since Python seems to scale very
well for big projects, we should seek solutions that scale equally well,
and are not going to slowly get in the way while a project grows.

> gnome-python has a pure-python module, gettext.py, that can read .mo
> files.  I haven't tried it yet, though.

I did not find the time either, yet I definitely should, and will.
You'll help me if, when ready, I cannot easily locate that module. :-)

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard





More information about the Python-list mailing list