[I18n-sig] Re: gettext in the standard library

François Pinard pinard@iro.umontreal.ca
04 Sep 2000 13:29:32 -0400


> [François Pinard]

> > So, not only would I like that Python does it better, but I would
> > welcome if Python was allowing the original language to be based on
> > either ASCII or Unicode, the most transparently as possible, of
> > course.

[Martin von Loewis]

> Isn't that limited by the structure of mo files? You'd somehow have to
> know what encoding to use when looking into the catalog - the content
> type only talks about the encoding of the translations.

It is surely a bit sad that the PO file header (the translation of the empty
string) has no current provision to describe `msgstr' language and encoding.

Yet, in practice, as long as the POT file is automatically derived from the
sources, each `msgstr' is identical to how it appears in the sources, and
consequently, it uses in the POT file the same encoding that in the source.
So, it is likely that retrieving the `msgstr' at run-time will work.

Problems would arise if the source strings were recoded, between string
extraction by POT tools, and string usage for translation at run-time.
Python will likely "internalise" or convert Unicode strings from UTF-8,
and this is a change of representation.  Maybe we could do similar changes
in the POT extractors, so the match occurs.  This might become difficult
if the Python sources are coded in other things than UTF-8.  But whatever
means will exist for Python to do the conversion, POT extractors might
have to be modified to use the same means.  Matches shall occur.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard