[Python-Dev] Re: gettext in the standard library

Martin von Loewis loewis@informatik.hu-berlin.de
Sat, 19 Aug 2000 09:25:20 +0200 (MET DST)


> What I'm missing in your doc-string is a reference as to how
> well gettext works together with Unicode. After all, i18n is
> among other things about international character sets.
> Have you done any experiments in this area ?

I have, to some degree. As others pointed out, gettext maps byte
arrays to byte arrays. However, in the GNU internationalization
project, it is convention to put an entry like

msgid ""
msgstr ""
"Project-Id-Version: GNU grep 2.4\n"
"POT-Creation-Date: 1999-11-13 11:33-0500\n"
"PO-Revision-Date: 1999-12-07 10:10+01:00\n"
"Last-Translator: Martin von L=F6wis <martin@mira.isdn.cs.tu-berlin.de>\n"
"Language-Team: German <de@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=3DISO-8859-1\n"
"Content-Transfer-Encoding: 8-bit\n"

into the catalog, which can be accessed as translation of the empty
string. It typically has a charset=3D element, which allows to analyse
what character set is used in the catalog. Of course, this is a
convention only, so it may not be present. If it is absent, and
conversion to Unicode is requested, it is probably a good idea to
assume UTF-8 (as James indicated, that will be the GNOME coded
character set for catalogs, for example).

In any case, I think it is a good idea to support retrieval of
translated strings as Unicode objects. I can think of two alternative
interfaces:

gettext.gettext(msgid, unicode=3D1)
#or
gettext.unigettext(msgid)

Of course, if applications install _, they'd know whether they want
unicode or byte strings, so _ would still take a single argument.

However, I don't think that this feature must be there at the first
checkin; I'd volunteer to work on a patch after Barry has installed
his code, and after I got some indication what the interface should
be.

Regards,
Martin