[I18n-sig] Determine encoding from $LANG

Markus Kuhn mkuhn@suse.de
Thu, 28 Jun 2001 10:03:59 +0200 (CEST)


On Tue, 26 Jun 2001, Bruno Haible wrote:
>
>      A program cannot be considered properly internationalized
>      until it obeys the current locale (LC_ALL || LC_CTYPE || LANG).
>
> The programs we are waiting for are:
> [...]

Add to that list many of the programming languages that use Unicode
internally but that do not yet set the default i/o encoding correctly
automatically based on LC_ALL || LC_CTYPE || LANG.

For example TCL currently uses some primitive LANG substring matching,
which basically gets only a few Japanese and Russian encodings right. The
TCL function unix/tclUnixInit.c:TclpSetInitialEncodings really should call
libcharset or nl_langinfo(CODESET) instead:

  https://sourceforge.net/tracker/?func=detail&aid=418645&group_id=10894&atid=110894

I suspect that Perl and Python are not much better and don't call
nl_langinfo(CODESET) or the portable libcharset wrapper around it either
to properly determine the locale-dependent external encoding.

References on how to determine the character encoding from the locale in a
safe and portable manner:

http://www.cl.cam.ac.uk/~mgk25/unicode.html#activate
http://clisp.cons.org/~haible/packages-libcharset.html
http://www.opengroup.org/onlinepubs/7908799/xsh/langinfo.h.html

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>