From bwarsaw@python.org  Tue Dec 21 07:16:20 1999
From: bwarsaw@python.org (Barry A. Warsaw)
Date: Tue, 21 Dec 1999 02:16:20 -0500 (EST)
Subject: [Mailman-i18n] Python I18N tools and Mailman
References: <14413.19305.495080.332505@anthem.cnri.reston.va.us>
 <Pine.LNX.4.04.9912120825090.17522-100000@vgg.sci.uma.es>
 <14411.9005.689029.197841@anthem.cnri.reston.va.us>
 <Pine.LNX.4.10.9912071152590.1125-100000@localhost>
 <m11ykzM-000CxSC@artcom0.artcom-gmbh.de>
Message-ID: <199912210716.CAA15614@anthem.cnri.reston.va.us>

Hello all,

I have subscribed you all to a new mailing list I've set up, called
mailman-i18n@python.org.  I apologize for this breach of etiquette,
but it's late at night and I wanted to have a more formal way of
tracking this discussion.  If any of you want to opt out, or change
your subscription address, let me know directly or go through the
Mailman URL that you should have already received in your welcome
message.

I've included you on this list because you've all been involved in
helping to internationalize Mailman <www.list.org>, whether you're
aware of it or not. :) We have a lot of overlap in some of Python I18N
tools, and I'd love to somehow converge on a single set of tools and
modules that we can recommend for Python hackers.  If we can do this,
I believe I can convince Guido to add them to the standard Python
distribution for 1.6.  Mailman will make a good first (afaik) test
case in using this suite of tools.

Some background to bring you all up to speed.  This is my current
understanding, so please correct any errors!

There's been quite a bit of work on internationalizing Mailman.  Juan
Carlos and Victoriano have done a significant amount of work on this,
with lots of input and feedback from Mads.  We've had back-and-forths
on the design, which I believe we now all agree on.  They have a
working prototype and it's possible that I'll be integrating the first
set of I18N patches into Mailman over the Xmas holiday.

I wrote a pygettext.py which creates .pot files via Python's tokenize
module (with patches by Bernhard and Mads).  From there you can use
standard tools (e.g. GNU gettext, emacs, etc.) to generate .po files.
pygettext.py is in the Python CVS tree.

Martin, I believe, wrote the original intlmodule.c as a wrapper around
the OS's intl library.  This library is available on Solaris and
Linux, and possibly other systems, but I think there were reports of
compilation problems on some systems.  Also, this is a C extension, so
it's harder to rely on for applications, until it becomes a standard
part of Python (which might be possible for 1.6).  Your seminal paper
is available in the IPC6 proceedings[1].

Peter Funk wrote the module below, which is a pure-Python
implementation of the intl API.  Peter's module also uses Martin's
intl module if it's available.  James Henstridge wrote a version that
is in the RedHat distribution, and has some expermental features,
e.g. to write .po files.

So there is definitely overlap here.  It would be great if we could
agree on the One True Intl Module to use for Python applications.  If
we can do this, I will lobby Guido to make it a standard part of the
Python distribution.

Thanks,
-Barry

[1] http://www.python.org/workshops/1997-10/proceedings/

-------------------- snip snip --------------------
#!/usr/bin/env python
"""i18n (multiple language) support.  Reads .mo files from GNU gettext =
msgfmt

If you want to prepare your Python programs for i18n you should=20
add the following lines to the top of a BASIC_MAIN module of your py-pr=
ogram:
    try:
        import fintl
        gettext =3D fintl.gettext
        fintl.bindtextdomain(YOUR_PROGRAM, YOUR_LOCALEDIR)
        fintl.textdomain(YOUR_PROGRAM)
    except ImportError:
        def gettext(msg):
            return msg
    _ =3D gettext
and/or also add the following to the top of any module containing messa=
ges:
    import BASIC_MAIN
    _ =3D BASIC_MAIN.gettext
           =20
Now you should use _("....") everywhere instead of "...." for message t=
exts.

Once you have written your internationalized program, you can use
the suite of utility programs contained in the GNU gettext package to a=
id
the translation into other languages. =20

You ARE NOT REQUIRED to release the sourcecode of your program, since=20=

linking of your program against GPL code is avoided by this module. =20=

(Although it is possible to use the GNU gettext library by using the=20=

intl.so module written by Martin von L=F6wis if this is available, it i=
s=20
not required to use it in the  first place)
"""
# Copyright 1999 by pf@artcom-gmbh.de (Peter Funk)
# =20
#                         All Rights Reserved
#
# Permission to use, copy, modify, and distribute this software and its=

# documentation for any purpose and without fee is hereby granted,
# provided that the above copyright notice appear in all copies.

# ArtCom GmbH AND Peter Funk DISCLAIMS ALL WARRANTIES WITH REGARD TO
# THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY
# AND FITNESS, IN NO EVENT SHALL ArtCom GmBH or Peter Funk BE LIABLE
# FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
# AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
# OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.=


_default_localedir =3D '/usr/share/locale'
_default_domain =3D 'python'

# check out, if Martin v. L=F6wis 'intl' module interface to the GNU ge=
ttext
# library is available and use it only, if it is available:=20
try:
    from intl import *
except ImportError:
    # now do what the gettext library provides in pure Python:
    error =3D 'fintl.error'
    # some globals preserving state:
    _languages =3D []
    _default_mo =3D None # This is default message outfile used by 'get=
text'
    _loaded_mos =3D {}   # This is a dictionary of loaded message outpu=
t files

    # some small little helper routines:
    def _check_env():
        """examine language enviroment variables and return list of lan=
guages"""
        languages =3D []
        import os, string
        for envvar in ('LANGUAGE', 'LC_ALL', 'LC_MESSAGES', 'LANG'):
            if os.environ.has_key(envvar):
                languages =3D string.split(os.environ[envvar], ':')
                break
        # use locale 'C' as default fallback:
        if 'C' not in _languages:
            languages.append('C')
        return languages

    # Utility function used to decode binary .mo file header and seek t=
ables:
    def _decode_Word(bin):
        # This assumes little endian (intel, vax) byte order.
        return  ord(bin[0])        + (ord(bin[1]) <<  8) + \
               (ord(bin[2]) << 16) + (ord(bin[3]) << 24)

    # Now the methods designed to be used from outside:

    def gettext(message):
        """return localized version of a 'message' string"""
        if _default_mo is None:=20
            textdomain()
        return _default_mo.gettext(message)

    _ =3D gettext

    def dgettext(domain, message):
        """like gettext but looks up 'message' in a special 'domain'"""=

        # This may useful for larger software systems
        if not _loaded_mos.has_key(domain):
            raise error, "No '" + domain + "' message domain"
        return _loaded_mos[domain].gettext(message)

    class _MoDict:
        """read a .mo file into a python dictionary"""
        MO_MAGIC =3D 0x950412de # Magic number of .mo files
        def __init__(self, domain=3D_default_domain, localedir=3D_defau=
lt_localedir):
            global _languages
            self.catalog =3D {}
            self.domain =3D domain
            self.localedir =3D localedir
            # delayed access to environment variables:
            if not _languages:
                _languages =3D _check_env()
            for self.lang in _languages:
                if self.lang =3D=3D 'C':
                    return
                mo_filename =3D "%s/%s/LC_MESSAGES/%s.mo" % (
                                                  localedir, self.lang,=
 domain)
                try:
                     buffer =3D open(mo_filename, "rb").read()
                     break
                except IOError:
                     pass
            else:
                return # assume C locale
            # Decode the header of the .mo file (5 little endian 32 bit=
 words):
            if _decode_Word(buffer[:4]) !=3D self.MO_MAGIC :
                raise error, '%s seems not be a valid .mo file' % mo_fi=
lename
            self.mo_version =3D _decode_Word(buffer[4:8])
            num_messages    =3D _decode_Word(buffer[8:12])
            master_index    =3D _decode_Word(buffer[12:16])
            transl_index    =3D _decode_Word(buffer[16:20])
            buf_len =3D len(buffer)
            # now put all messages from the .mo file buffer in the cata=
log dict:
            for i in xrange(0, num_messages):
                start_master=3D _decode_Word(buffer[master_index+4:mast=
er_index+8])
                end_master  =3D start_master + \
                              _decode_Word(buffer[master_index:master_i=
ndex+4])
                start_transl=3D _decode_Word(buffer[transl_index+4:tran=
sl_index+8])
                end_transl  =3D start_transl + \
                              _decode_Word(buffer[transl_index:transl_i=
ndex+4])
                if end_master <=3D buf_len and end_transl <=3D buf_len:=

                    self.catalog[buffer[start_master:end_master]]=3D\
                                 buffer[start_transl:end_transl]
                else:=20
                    raise error, ".mo file '%s' is corrupt" % mo_filena=
me
                # advance to the next entry in seek tables:
                master_index=3D master_index + 8
                transl_index=3D transl_index + 8

        def gettext(self, message):
            """return the translation of a given message"""
            try:
                return self.catalog[message]
            except KeyError:
                return message
        # _MoDict instances may be also accessed using mo[msg] or mo(ms=
g):
        __getitem =3D gettext
        __call__ =3D gettext

    def textdomain(domain=3D_default_domain):
        """Sets the 'domain' to be used by this program. Defaults to 'p=
ython'"""
        global _default_mo
        if not _loaded_mos.has_key(domain):
             _loaded_mos[domain] =3D _MoDict(domain)
        _default_mo =3D _loaded_mos[domain]

    def bindtextdomain(domain, localedir=3D_default_localedir):
        global _default_mo
        if not _loaded_mos.has_key(domain):
            _loaded_mos[domain] =3D _MoDict(domain, localedir)
        if _default_mo is not None:=20
            _default_mo =3D _loaded_mos[domain]

def _testdriver(argv):
    message =3D ""
    domain =3D _default_domain
    localedir =3D _default_localedir
    if len(argv) > 1:
        message =3D argv[1]
        if len(argv) > 2:
            domain =3D argv[2]
            if len(argv) > 3:
                localedir =3D argv[3]
    # now perform some testing of this module:
    bindtextdomain(domain, localedir)
    textdomain(domain)
    info =3D gettext('')  # this is where special info is often stored
    if info:
        print ".mo file for domain %s in %s contains:" % (domain, local=
edir)
        print info
    else:
        print ".mo file contains no info"
    if message:
        print "Translation of '"+ message+ "' is '"+ _(message)+ "'"
    else:
        for msg in ("Cancel", "No", "OK", "Quit", "Yes"):
            print "Translation of '"+ msg + "' is '"+ _(msg)+ "'"

if __name__ =3D=3D '__main__':
    import sys
    if len(sys.argv) > 1 and (sys.argv[1] =3D=3D "-h" or sys.argv[1] =3D=
=3D "-?"):
        print "Usage :", sys.argv[0], "[ MESSAGE [ DOMAIN [ LOCALEDIR ]=
]]"
    _testdriver(sys.argv)