[Tutor] ignoring diacritical signs
Steven D'Aprano
steve at pearwood.info
Mon Dec 2 17:20:42 CET 2013
Oh, I forgot...
On Mon, Dec 02, 2013 at 06:11:04AM -0800, Albert-Jan Roskam wrote:
> if self.ignorecase:
> value = value.lower()
The right way to do case-insensitive comparisons is to use casefold, not
lower. Unfortunately, casefold is only available in Python 3.3 and on,
so for older versions you're stuck with lower (or maybe upper, if you
prefer). I usually put this at the top of my module:
try:
''.casefold
except AttributeError:
def casefold(s):
return s.lower()
else:
def casefold(s):
return s.casefold()
then just use the custom casefold function.
Case-folding isn't entirely right either, it will give the wrong results
in Turkish and Azerbaijani and one or two other languages, due to the
presence of both dotted and dotless I, but it's as close as you're going
to get without full locale awareness.
http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail
By the way, that dot on the lowercase I and J, and the uppercase dotted
I in Turkish, is called a tittle, and is technically a diacritic too.
Next time you come across somebody bitching about how all those weird
Unicode accents are a waste of time, you can reply "Is that rıght?"
--
Steven
More information about the Tutor
mailing list