Is unicode.lower() locale-independent?

Torsten Bronger bronger at physik.rwth-aachen.de
Sat Jan 12 18:56:41 EST 2008


Hallöchen!

John Machin writes:

> On Jan 12, 11:26 pm, Torsten Bronger <bron... at physik.rwth-aachen.de>
> wrote:
>
>> [...]
>>
>> Slightly off-topic because it's not part of the Unicode
>> subsystem, but I was once irritated that the none-breaking space
>> (codepoint xa0 I think) was included into string.whitespace.  I
>> cannot reproduce it on my current system anymore, but I was
>> pretty sure it occured with a fr_FR.UTF-8 locale.  Is this
>> possible?  And who is to blame, or must my program cope with such
>> things?
>
> The NO-BREAK SPACE is treated as whitespace in the Python unicode
> subsystem. As for str objects, the default "C" locale doesn't know
> it exists; otherwise AFAIK if the character set for the locale has
> it, it will be treated as whitespace.
>
> [...]
>
> What was the *real* cause of your irritation?

I was missing something like string.ascii_whitespace in the string
module.  There is string.ascii_lower after all, and the
documentation doesn't clearly say string.whitespace is
locale-dependent.

In contrast to lower/uppercase conversions, where often human
language is transformed, the use cases for whitespace handling are
mostly syntactic purposes.  And parsing something with
locale-dependent whitespace definitions is broken.

Thus, I had the choice: defining my own whitespace constant, or
forcing the 'C' locale.  I chose the latter because I'm not a big
fan of locales anyway.

On my current computer(s), all locales seem to have the same
definition of whitespace as the 'C' locale.  I've only seen that one
(broken, in my opinion) French locale which included the NBSP.  In
my opinion, this is a trap rather than anything useful.  Well, if I
indeed remember it correctly; this is why I asked above, "Is it
possible?".

Tschö,
Torsten.

-- 
Torsten Bronger, aquisgrana, europa vetus
                                      Jabber ID: bronger at jabber.org
               (See http://ime.webhop.org for further contact info.)



More information about the Python-list mailing list