[I18n-sig] Perhaps the locale should matter?

Markus Kuhn Markus.Kuhn@cl.cam.ac.uk
Fri, 05 May 2000 20:59:02 +0100


Guido van Rossum <guido@python.org> writes:
> Problem: I have no idea how to go from the locale setting (a
> two-charater language abbreviation) to a specific character encoding
> -- but that might conceivably a fixed table.

Starting with glibc 2.2, you can ask for the encoding name with

  #include <langinfo.h>

  encoding_string = nl_langinfo(CODESET);

as described on

  http://www.opengroup.org/onlinepubs/7908799/xsh/langinfo.h.html

But are you really interested in the name of the encoding or not more in
the already Unicode-converted string? In this case, simply use the C
library's wide character I/O functions getwc(), fwscanf(), etc. as
described in

  http://www.unix-systems.org/version2/whatsnew/login_mse.html

or

  http://www.cl.cam.ac.uk/~mgk25/volatile/ISO-C-FDIS.1999-04.pdf
  http://www.cl.cam.ac.uk/~mgk25/volatile/ISO-C-FDIS.1999-04.txt
  (section 7.24)

and the locale dependent conversion to Unicode will be done for you by
the C library. Under glibc 2.2, wchar_t always contains UCS-4 values.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>