Is unicode.lower() locale-independent?

Carl Banks pavlovevidence at gmail.com
Sat Jan 12 17:49:48 EST 2008


On Sat, 12 Jan 2008 13:51:18 -0800, John Machin wrote:

> On Jan 12, 11:26 pm, Torsten Bronger <bron... at physik.rwth-aachen.de>
> wrote:
>> Hallöchen!
>>
>>
>>
>> Fredrik Lundh writes:
>> > Robert Kern wrote:
>>
>> >>> However it appears from your bug ticket that you have a much
>> >>> narrower problem (case-shifting a small known list of English words
>> >>> like VOID) and can work around it by writing your own
>> >>> locale-independent casing functions. Do you still need to find out
>> >>> whether Python unicode casings are locale-dependent?
>>
>> >> I would still like to know. There are other places where .lower() is
>> >> used in numpy, not to mention the rest of my code.
>>
>> > "lower" uses the informative case mappings provided by the Unicode
>> > character database; see
>>
>> >    http://www.unicode.org/Public/4.1.0/ucd/UCD.html
>>
>> > afaik, changing the locale has no influence whatsoever on Python's
>> > Unicode subsystem.
>>
>> Slightly off-topic because it's not part of the Unicode subsystem, but
>> I was once irritated that the none-breaking space (codepoint xa0 I
>> think) was included into string.whitespace.  I cannot reproduce it on
>> my current system anymore, but I was pretty sure it occured with a
>> fr_FR.UTF-8 locale.  Is this possible?  And who is to blame, or must my
>> program cope with such things?
> 
> The NO-BREAK SPACE is treated as whitespace in the Python unicode
> subsystem. As for str objects, the default "C" locale doesn't know it
> exists; otherwise AFAIK if the character set for the locale has it, it
> will be treated as whitespace.
> 
> You were irritated because non-break SPACE was included in
> string.whiteSPACE? Surely not! It seems eminently logical to me.

To me it seems the point of a non-breaking space is to have something 
that's printed as whitespace but not treated as it.

> Perhaps
> you were irritated because str.split() ignored the "no-break"? If like
> me you had been faced with removing trailing spaces from text columns in
> databases, you surely would have been delighted that str.rstrip()
> removed the trailing-padding-for-nicer-layout no-break spaces that the
> users had copy/pasted from some clown's website :-)
> 
> What was the *real* cause of your irritation?

If you want to use str.split() to split words, you will foil the user who 
wants to not break at a certain point.

Your use of rstrip() is a lot more specialized, if you ask me.


Carl Banks



More information about the Python-list mailing list