Is unicode.lower() locale-independent?

John Machin sjmachin at lexicon.net
Sat Jan 12 16:51:18 EST 2008


On Jan 12, 11:26 pm, Torsten Bronger <bron... at physik.rwth-aachen.de>
wrote:
> Hallöchen!
>
>
>
> Fredrik Lundh writes:
> > Robert Kern wrote:
>
> >>> However it appears from your bug ticket that you have a much
> >>> narrower problem (case-shifting a small known list of English
> >>> words like VOID) and can work around it by writing your own
> >>> locale-independent casing functions. Do you still need to find
> >>> out whether Python unicode casings are locale-dependent?
>
> >> I would still like to know. There are other places where .lower()
> >> is used in numpy, not to mention the rest of my code.
>
> > "lower" uses the informative case mappings provided by the Unicode
> > character database; see
>
> >    http://www.unicode.org/Public/4.1.0/ucd/UCD.html
>
> > afaik, changing the locale has no influence whatsoever on Python's
> > Unicode subsystem.
>
> Slightly off-topic because it's not part of the Unicode subsystem,
> but I was once irritated that the none-breaking space (codepoint xa0
> I think) was included into string.whitespace.  I cannot reproduce it
> on my current system anymore, but I was pretty sure it occured with
> a fr_FR.UTF-8 locale.  Is this possible?  And who is to blame, or
> must my program cope with such things?

The NO-BREAK SPACE is treated as whitespace in the Python unicode
subsystem. As for str objects, the default "C" locale doesn't know it
exists; otherwise AFAIK if the character set for the locale has it, it
will be treated as whitespace.

You were irritated because non-break SPACE was included in
string.whiteSPACE? Surely not! It seems eminently logical to me.
Perhaps you were irritated because str.split() ignored the "no-break"?
If like me you had been faced with removing trailing spaces from text
columns in databases, you surely would have been delighted that
str.rstrip() removed the trailing-padding-for-nicer-layout no-break
spaces that the users had copy/pasted from some clown's website :-)

What was the *real* cause of your irritation?



More information about the Python-list mailing list