LANG, locale, unicode, setup.py and Debian packaging

Mon Jan 14 18:23:23 EST 2008

> Ah. Can one call it after the full call has been done:
> locale.setlocale(locale.LC_ALL,'')
> locale.setlocale(locale.LC_ALL)
> Without any issues?

If you pass LC_ALL, then some systems will give you funny results
(semicolon-separated enumerations of all the categoryies). Instead,
pick a specific category, e.g. LC_CTYPE.

>>> I need that two-letter code that's hidden in a
>>> typical locale like en_ZA.utf8 -- I want that 'en' part.
> Okay, I need it because I have a tree of dirs: en, it, fr and so on for the 
> help files -- it's to help build a path to the right html file for the 
> language being supported.

Ok - taking the first two letters should then be fine, assuming all your
directories have two-letter codes.

>> Not sure why you want that. Notice that the locale name is fairly system
>> specific, in particular on non-POSIX systems. It may be
>> "English_SouthAfrica" on some systems.
> Wow, another thing I had no idea about. So far all I've seen are the 
> xx_yy.utf8 shaped ones.
> 
> I will have some trouble then, with the help system.

If you have "unknown" systems, you can try to use locale.normalize.
This has a hard-coded database which tries to deal with some different
spellings. For "English", it will give you en_EN.ISO8859-1.

OTOH, if your software only works on POSIX systems, anyway, I think
it is a fair assumption that they use two-letter codes for the
languages (the full language name is only used on Windows, AFAIK).

Notice that xx_yy.utf8 definitely is *not* the only syntactical form.
utf8 is spelled in various ways (lower and upper case, with and without
dash), and there may be other encodings (see the en_EN example above),
or no encoding at all in the locale name, and their may be "modifiers":

aa_ER at saaho (saaho dialect in Eritrea)
be_BY at latin (as opposed to the Cyrillic be_BY locale)
            likewise for sr_RS
de_DE at euro (as opposed to the D-Mark locale); likewise for other
           members of the Euro zone
ca_ES.UTF-8 at valencia (Valencian - Southern Catalan)
           (no real difference to ca_ES at euro, but differences in
            message translations)
gez_ER at abegede  (Ge'ez language in Eritrea with Abegede collation)
tt_RU at iqtelif.UTF-8   (Tatar language written in IQTElif alphabet)
uz_UZ at cyrillic  (as opposed to latin uz_UZ)

There used to be a @bokmal modifier for Norwegian (as opposed to
the Nynorsk grammar), but they have separate language codes now
(nb vs. nn).

Regards,
Martin

Regards,
Martin