Share Code Tips

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Jul 19 23:44:36 EDT 2013


On Fri, 19 Jul 2013 21:04:55 -0400, Devyn Collier Johnson wrote:

> In the future, I want to
> make the perfect international-case-insensitive if-statement. For now,
> my code only supports a limited range of characters. Even with casefold,
> I will have some issues as Chris Angelico mentioned.

There are hundreds of written languages in the world, with thousands of 
characters, and most of them have rules about case-sensitivity and 
character normalization. For example, in Greek, lowercase Σ is σ except 
at the end of a word, when it is ς.

≻≻≻ 'Σσς'.upper()
'ΣΣΣ'
≻≻≻ 'Σσς'.lower()
'σσς'
≻≻≻ 'Σσς'.casefold()
'σσσ'


So in this case, casefold() correctly solves the problem, provided you 
are comparing modern Greek text. But if you're comparing text in some 
other language which merely happens to use Greek letters, but doesn't 
have the same rules about letter sigma, then it will be inappropriate. So 
you cannot write a single "perfect" case-insensitive comparison, the best 
you can hope for is to write dozens or hundreds of separate case-
insensitive comparisons, one for each language or family of languages.

For an introduction to the problem:

http://www.w3.org/International/wiki/Case_folding

http://www.unicode.org/faq/casemap_charprop.html




> Also, "ß" is not really the same as "ss".

Sometimes it is. Sometimes it isn't.



-- 
Steven



More information about the Python-list mailing list