[I18n-sig] Re: Determine encoding from $LANG

Jarkko Hietaniemi jhi@iki.fi
Thu, 28 Jun 2001 08:51:01 -0500


On Thu, Jun 28, 2001 at 10:03:59AM +0200, Markus Kuhn wrote:
> On Tue, 26 Jun 2001, Bruno Haible wrote:
> >
> >      A program cannot be considered properly internationalized
> >      until it obeys the current locale (LC_ALL || LC_CTYPE || LANG).
> >
> > The programs we are waiting for are:
> > [...]
> 
> Add to that list many of the programming languages that use Unicode
> internally but that do not yet set the default i/o encoding correctly
> automatically based on LC_ALL || LC_CTYPE || LANG.

Until very recently the term "default I/O encoding" didn't mean
anything to Perl (it was native bytes, period).  Now we do have a new
I/O subsystem (with which we can do things like "this I/O stream is in
UTF-8") but the new I/O subsystem is not yet available in any public
release of Perl, only in one developer release so far (5.7.1).

> I suspect that Perl and Python are not much better and don't call
> nl_langinfo(CODESET) or the portable libcharset wrapper around it either

No, we don't call nl_langinfo(CODESET).  We still need to figure out
the correct policy and place for doing that.  Sorry if "the correct
policy" has been already extensively discussed and answered in this
thread, this is the first message that was CCed (well, which I saw,
anyway) to perl-unicode.  But as a general rule, Perl doesn't do much
in the way of locales unless the user explicitly asks for a locale
behaviour by using setlocale().  Changing that now to be more
'automatic' would break backward compatibility.

> to properly determine the locale-dependent external encoding.
> 
> References on how to determine the character encoding from the locale in a
> safe and portable manner:
> 
> http://www.cl.cam.ac.uk/~mgk25/unicode.html#activate
> http://clisp.cons.org/~haible/packages-libcharset.html

Alas, IIUC, LGPL is currently slightly incompatible for inclusion
into Perl, for something as central piece of a code as locale
handling.  (Note: this is just a statement of facts as far as
I understand them, I do not intend or want to start discussion
about software licensing politics.)

> http://www.opengroup.org/onlinepubs/7908799/xsh/langinfo.h.html

But thanks for the pointers.  I don't know whether I will be able to
smush in the use use nl_langinfo() for the upcoming public release of
Perl, Perl 5.8.0, but I will certainly give some thought to the matter.

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen