Locale confusion

Jorgen Grahn jgrahn-nntq at algonet.se
Fri Jan 7 05:24:49 EST 2005


[Long posting due to the examples, but pretty simple question.]

I'm sitting here with a Debian Linux 'Woody' system with the default Python
2.2 installation, and I want the re module to understand that
re.compile(r'\W+'. re.LOCALE) doesn't match my national, accented
characters.

I don't quite understand how the locale module reasons about these things,
and Python doesn't seem to act as other programs on my system. Bug or my
mistake?  Here's my environment:

frailea> env |grep -e LC -e LANG
LC_MESSAGES=C
LC_TIME=C
LANG=sv_SE
LC_NUMERIC=C
LC_MONETARY=C
frailea> locale
LANG=sv_SE
LC_CTYPE="sv_SE"
LC_NUMERIC=C
LC_TIME=C
LC_COLLATE="sv_SE"
LC_MONETARY=C
LC_MESSAGES=C
LC_PAPER="sv_SE"
LC_NAME="sv_SE"
LC_ADDRESS="sv_SE"
LC_TELEPHONE="sv_SE"
LC_MEASUREMENT="sv_SE"
LC_IDENTIFICATION="sv_SE"
LC_ALL=

This seems to indicate that $LANG acts as a fallback when other things (e.g.
LC_CTYPE isn't defined) and that's also what the glibc setlocale(3) man page
says. Works well for me in general, too.  However, consider this tiny Python
program:

frailea> cat foo
import locale
print locale.getlocale()
locale.setlocale(locale.LC_CTYPE)
print locale.getlocale()

When I paste it into an interactive Python session, the locale is already
set up correctly (which is what I suppose interactive mode /should/ do):

>>> import locale
>>> print locale.getlocale()
['sv_SE', 'ISO8859-1']
>>> locale.setlocale(locale.LC_CTYPE)
'sv_SE'
>>> print locale.getlocale()
['sv_SE', 'ISO8859-1']
>>> 

When I run it as a script it isn't though, and the setlocale() call does not
appear to fall back to looking at $LANG as it's supposed to(?), so my
LC_CTYPE remains in the POSIX locale:

frailea> python foo
(None, None)
(None, None)

The corresponding program written in C works as expected:

frailea> cat foot.c
#include <stdio.h>
#include <locale.h>
int main(void) {
    printf("%s\n", setlocale(LC_CTYPE, 0));
    printf("%s\n", setlocale(LC_CTYPE, ""));
    printf("%s\n", setlocale(LC_CTYPE, 0));
    return 0;
}
frailea> ./foot
C
sv_SE
sv_SE

So, is this my fault or Python's?  I realize I could just adapt and set
$LC_CTYPE explicitly in my environment, but I don't want to capitulate for a
Python bug, if that's what this is.

BR,
Jorgen

-- 
  // Jorgen Grahn <jgrahn@       Ph'nglui mglw'nafh Cthulhu
\X/                algonet.se>   R'lyeh wgah'nagl fhtagn!



More information about the Python-list mailing list