[Python-Dev] gettext in the standard library

Fri, 18 Aug 2000 17:13:31 -0400 (EDT)

Apologies for duplicates to those of you already on python-dev...

I've been working on merging all the various implementations of Python
interfaces to the GNU gettext libraries.  I've worked from code
contributed by Martin, James, and Peter.  I now have something that
seems to work fairly well so I thought I'd update you all.

After looking at all the various wizzy and experimental stuff in these
implementations, I opted for simplicity, mostly just so I could get my
head around what was needed.  My goal was to build a fast C wrapper
module around the C library, and to provide a pure Python
implementation of an identical API for platforms without GNU gettext.

I started with Martin's libintlmodule, renamed it _gettext and cleaned
up the C code a bit.  This provides gettext(), dgettext(),
dcgettext(), textdomain(), and bindtextdomain() functions.  The
gettext.py module imports these, and if it succeeds, it's done.

If that fails, then there's a bunch of code, mostly derived from
Peter's fintl.py module, that reads the binary .mo files and does the
look ups itself.  Note that Peter's module only supported the GNU
gettext binary format, and that's all mine does too.  It should be
easy to support other binary formats (Solaris?) by overriding one
method in one class, and contributions are welcome.

James's stuff looked cool too, what I grokked of it :) but I think
those should be exported as higher level features.  I didn't include
the ability to write .mo files or the exported Catalog objects.  I
haven't used the I18N services enough to know whether these are
useful.

I added one convenience function, gettext.install().  If you call
this, it inserts the gettext.gettext() function into the builtins
namespace as `_'.  You'll often want to do this, based on the I18N
translatable strings marking conventions.  Note that importing gettext
does /not/ install by default!

And since (I think) you'll almost always want to call bindtextdomain()
and textdomain(), you can pass the domain and localedir in as
arguments to install.  Thus, the simple and quick usage pattern is:

    import gettext
    gettext.install('mydomain', '/my/locale/dir')

    print _('this is a localized message')

I think it'll be easier to critique this stuff if I just check it in.
Before I do, I still need to write up a test module and hack together
docos.  In the meantime, here's the module docstring for gettext.py.
Talk amongst yourselves. :)

-Barry

"""Internationalization and localization support.

This module provides internationalization (I18N) and localization (L10N)
support for your Python programs by providing an interface to the GNU gettext
message catalog library.

I18N refers to the operation by which a program is made aware of multiple
languages.  L10N refers to the adaptation of your program, once
internationalized, to the local language and cultural habits.  In order to
provide multilingual messages for your Python programs, you need to take the
following steps:

    - prepare your program by specially marking translatable strings
    - run a suite of tools over your marked program files to generate raw
      messages catalogs
    - create language specific translations of the message catalogs
    - use this module so that message strings are properly translated

In order to prepare your program for I18N, you need to look at all the strings
in your program.  Any string that needs to be translated should be marked by
wrapping it in _('...') -- i.e. a call to the function `_'.  For example:

    filename = 'mylog.txt'
    message = _('writing a log message')
    fp = open(filename, 'w')
    fp.write(message)
    fp.close()

In this example, the string `writing a log message' is marked as a candidate
for translation, while the strings `mylog.txt' and `w' are not.

The GNU gettext package provides a tool, called xgettext that scans C and C++
source code looking for these specially marked strings.  xgettext generates
what are called `.pot' files, essentially structured human readable files
which contain every marked string in the source code.  These .pot files are
copied and handed over to translators who write language-specific versions for
every supported language.

For I18N Python programs however, xgettext won't work; it doesn't understand
the myriad of string types support by Python.  The standard Python
distribution provides a tool called pygettext that does though (usually in the
Tools/i18n directory).  This is a command line script that supports a similar
interface as xgettext; see its documentation for details.  Once you've used
pygettext to create your .pot files, you can use the standard GNU gettext
tools to generate your machine-readable .mo files, which are what's used by
this module and the GNU gettext libraries.

In the simple case, to use this module then, you need only add the following
bit of code to the main driver file of your application:

    import gettext
    gettext.install()

This sets everything up so that your _('...') function calls Just Work.  In
other words, it installs `_' in the builtins namespace for convenience.  You
can skip this step and do it manually by the equivalent code:

    import gettext
    import __builtin__
    __builtin__['_'] = gettext.gettext

Once you've done this, you probably want to call bindtextdomain() and
textdomain() to get the domain set up properly.  Again, for convenience, you
can pass the domain and localedir to install to set everything up in one fell
swoop:

    import gettext
    gettext.install('mydomain', '/my/locale/dir')

"""