[I18n-sig] Perhaps the locale should matter?

Bruno Haible haible@ilog.fr
Fri, 5 May 2000 22:52:39 +0200 (MET DST)


Florian Weimer quotes Guido van Rossum <guido@python.org>:

> > Problem: I have no idea how to go from the locale setting (a
> > two-charater language abbreviation) to a specific character encoding
> > -- but that might conceivably a fixed table.

The recommended POSIX way is  nl_langinfo(CODESET).

But you have to hack around two system dependencies:

1. Some systems don't support it correctly:
  - FreeBSD 3.3 and SunOS 4 always return a NULL pointer.
  - Solaris 2.4 always returns an empty string.
  - Solaris 2.6 sometimes returns an empty string.
  - Linux libc5 and glibc 2.0.x don't have it at all.
  - glibc 2.1.x has it but only if you use -D_XOPEN_SOURCE.

2. Some systems returns non-canonical names for encodings, e.g. Solaris
   returns "PCK" when it means Shift_JIS.

Markus Kuhn writes:
> But are you really interested in the name of the encoding or not more in
> the already Unicode-converted string? In this case, simply use the C
> library's wide character I/O functions getwc(), fwscanf(), etc.

This will be true for glibc 2.2, but is not portable. The wchar_t type
is not guaranteed to be Unicode. On FreeBSD, indeed, it is not; it is locale
dependent.

Bruno