[Python-Dev] Re: gettext in the standard library

M.-A. Lemburg mal@lemburg.com
Sat, 19 Aug 2000 11:37:28 +0200


"Barry A. Warsaw" wrote:
> 
> >>>>> "M" == M  <mal@lemburg.com> writes:
> 
>     M> I know that gettext is a standard, but from a technology POV I
>     M> would have implemented this as codec wich can then be plugged
>     M> wherever l10n is needed, since strings have the new .encode()
>     M> method which could just as well be used to convert not only the
>     M> string into a different encoding, but also a different
>     M> language.  Anyway, just a thought...
> 
> That might be cool to play with, but I haven't done anything with
> Python's Unicode stuff (and painfully little with gettext too) so
> right now I don't see how they'd fit together.  My gut reaction is
> that gettext could be the lower level interface to
> string.encode(language).

Oh, codecs are not just about Unicode. Normal string objects
also have an .encode() method which can be used for these
purposes as well.
 
>     M> What I'm missing in your doc-string is a reference as to how
>     M> well gettext works together with Unicode. After all, i18n is
>     M> among other things about international character sets.
>     M> Have you done any experiments in this area ?
> 
> No, but I've thought about it, and I don't think the answer is good.
> The GNU gettext functions take and return char*'s, which probably
> isn't very compatible with Unicode.  _gettext therefore takes and
> returns PyStringObjects.

Martin mentioned the possibility of using UTF-8 for the
catalogs and then decoding them into Unicode. That should be
a reasonable way of getting .gettext() to talk Unicode :-)
 
> We could do better with the pure-Python implementation, and that might
> be a good reason to forgo any performance gains or platform-dependent
> benefits you'd get with _gettext.  Of course the trick is using the
> Unicode-unaware tools to build .mo files containing Unicode strings.
> I don't track GNU gettext developement close enough to know whether
> they are addressing Unicode issues or not.

Just dreaming a little here: I would prefer that we use some
form of XML to write the catalogs. XML comes with Unicode support
and tools for writing XML are available too. We'd only need
a way to transform XML into catalog files of some Python
specific platform independent format (should be possible to
create .mo files from XML too).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/