Marking translatable strings

Bernhard Herzog herzog at online.de
Thu Sep 16 13:50:41 EDT 1999


François Pinard <pinard at iro.umontreal.ca> writes:

> 1) Marking strings
>    ---------------

[other languages do it with the preprocessor or other similar tricks] 

> Python has no preprocessing, no special string syntax for markability,
> and moreover, it has doc strings!  So, at first glance, it looks difficult.
> However, and this is where my strange idea comes to play, it has eight
> type of strings: ', ", ''', """, r', r", r''' and r"""; and I thought
> that maybe we could just discipline ourselves to give more meaning to all
> these differences, since after all, if we except some ending backslash
> considerations, all eight types are equally capable of representing
> any string.

Note, that all these different types are only different at the lexical
level. The tokenizer treats them all as STRING-tokens, so could end up
with a lot of changes to Pythons internals to distinguish them at any
higher level.

[snip]
> So, the bizarre idea I got is that one could be to formalize this into
> a rule: strings of type ", """, r" and r""" could be all markable as
> translatable, while strings of type ', ''', r' and r''' would not be.
> On the other hand, this might be overkill, as maybe people are used to
> freely mix types ' and ", and this change could be seen as stressful.
> Could we choose better?

Hmm. Since raw strings are mainly used for regular expressions and
perhaps for WinDos filenames, you probably won't want to translate them,
so I'd say that raw strings should never be considered translatable.

> Surely, since doc strings use """ exclusively, there is no choice as to retain
> type """ for translability, wherever it appears.  However, forcing the use
> of """ everywhere we want translatibility is an overhead of four characters
> (just compare "TEXT" with """TEXT"""), while C use three or four characters
> (compare "TEXT" with _("TEXT") or N_("TEXT"), and bash uses only one (compare
> "TEXT" with $"TEXT").  I would like Python to be as comfortable as possible.
> If I could plainly use "TEXT" instead of 'TEXT' to mark translatability,
> I would have an overhead of zero characters, which would be better than
> everything, but I'm not sure if this constraint would be acceptable to
> Python writers.

But triple quotes are easier to type than the C-versions because you
just have to hit the same key several times instead of different keys
scattered all over the keyboard.

> Another possibility is to use ''"TEXT" instead of "TEXT", making an overhead
> of two characters: that is the compile time concatenation of '' with "TEXT".
> This combination is quite unlikely to me, and a bit uglier.

What I dislike most about your ideas so far is that they make some
apparantly normal string literals behave in special was. With your
suggestion about """ you'd expect 'print """red"""' to print rouge if
your program was properly localized for french, don't you.

Now, building some mechanism into python wouldn't be bad per se --- it
would probably be more portable than a gettext solution --- but it
should be done with either a new type of string literal (e.g. an
i-string: i"spam") or with a standard library module.

A special translated string literal means that the Python interpreter
would have to generate special bytecode to translate it everytime it's
executed. One way to do that would be to treat it as equivalent to e.g.
__gettext__("lovely plumage") and lookup __gettext__ like a normal
global name, so you could have a program wide __gettext__ hook in
__builtin__ and still override it on a per module basis. (Hmm, would a
similar lookup scheme be interesting for __import__?)

 
> 2) Translating strings
>    -------------------
[snip] 
> What would be the most comfortable for me, short of having the Python
> interpreter modified, is to merely use a function to force the actual
> translation of a string.  The most comfortable (the less intrusive) way
> would be to call:
> 
>         _(TEXT)
> 
> to get the translation of text.  It resembles C, but it overloads `_',
> which already has a preset meaning, interactively.  If I could push the
> preset `_' somewhere else, maybe on `__', I would do it and reserve `_'
> for translation, which would be much, much more common in the long run.

I dont't think the potential collision with _ in an interactive Python
session is a problem. For one thing, _ for translations would be used in
modules that you might import in an interactive session, but you
wouldn't normally mark strings as translatable in python code typed in
interactively.  Even if you imported the module with 'from module import
*', the _ would be treated as private to the module --- it starts with _
after all --- and not bound to the interactive namespace where it would
shadow the builtin _.

> Using a function would allow us to build the whole translation chain
> (administrating the translations with teams, etc.), yet if the syntax
> could be relieved with the help of Guido, I guess this would be welcome.
> We might need to experiment first.

I've used Martin von Löwis' intl module (a wrapper for GNU gettext) in
Sketch for quite some time, and didn't have any problems to just bind _
to intl.gettext and use it just like in C. IMO this is probably the best
approach to localized messages because it doesn't require any change to
Python whatsoever. If would be nice if it were part of the standard
distribution, or at least some equivalent module.

If you're careful to only use double-quoted strings inside of _(), you
can even use xgettext to build the initial po-files. xgettext will
likely complain about lots of invalid character constants and strings if
you have single- and triple-quoted strings in your source, but otherwise
it works quite well.

The only problem I can see with this is that you couldn't get xgettext
to recognize doc-strings, because you can't mark them with _() or rather
N_(). Actually translating the doc-strings isn't a problem, because
tools that access the doc-strings could just pass them through _().



> 3) Setting the textual domain
>    --------------------------
> 
> In a quick word, I guess that this problem could be fairly easily solved
> through the handy scope rules for resolution of names in Python.  Each module
> could have a standard global variable name setting which textual domain to
> use within it.  So, even with the control flying like hell between modules,
> it would not be a problem on average.

With intl, you'd could just define _ locally (i.e. on module level) as
e.g:

    def _(text):
        return dgettext("domain", text)


>  But there are problematic cases,
> like for when untranslated strings are transmitted to other modules, for
> being translated there, or even maybe for plain doc strings.  This requires
> good thought.  This problem is more difficult that many might thing at first.

But this isn't a python-specific problem, is it?  You'd have similar
problems with C-libraries, or with passing such strings from say a
third-party plugin to the main program.

Even so, whether strings passed between different components of a
program are translated and by whom, is part of the interface
specification. 

If, for instance, the caller of the function belongs to a different
textual domain than the function and the caller is expected to translate
the string the function returns, then the interface has to provide a
method to find out which domain to use.

In Python you could perhaps just return a tuple with two strings, the
untranslated message and the domain. Or perhaps the message and a
callable object that can perform the translation, or, even simpler, both
the translated and the untranslated message.

Then again, I can't think of an example where you have to pass
untranslated messages around between different components in different
domains (other than perhaps doc-strings). It seems to me that the
interface between the components is badly designed if you have to do
this. For doc-strings you could probably get away with a global
variable.


-- 
Bernhard Herzog	  | Sketch, a drawing program for Unix
herzog at online.de  | http://www.online.de/home/sketch/




More information about the Python-list mailing list