[I18n-sig] Perhaps the locale should matter?
Bruno Haible
haible@ilog.fr
Fri, 5 May 2000 22:52:39 +0200 (MET DST)
Florian Weimer quotes Guido van Rossum <guido@python.org>:
> > Problem: I have no idea how to go from the locale setting (a
> > two-charater language abbreviation) to a specific character encoding
> > -- but that might conceivably a fixed table.
The recommended POSIX way is nl_langinfo(CODESET).
But you have to hack around two system dependencies:
1. Some systems don't support it correctly:
- FreeBSD 3.3 and SunOS 4 always return a NULL pointer.
- Solaris 2.4 always returns an empty string.
- Solaris 2.6 sometimes returns an empty string.
- Linux libc5 and glibc 2.0.x don't have it at all.
- glibc 2.1.x has it but only if you use -D_XOPEN_SOURCE.
2. Some systems returns non-canonical names for encodings, e.g. Solaris
returns "PCK" when it means Shift_JIS.
Markus Kuhn writes:
> But are you really interested in the name of the encoding or not more in
> the already Unicode-converted string? In this case, simply use the C
> library's wide character I/O functions getwc(), fwscanf(), etc.
This will be true for glibc 2.2, but is not portable. The wchar_t type
is not guaranteed to be Unicode. On FreeBSD, indeed, it is not; it is locale
dependent.
Bruno