Case-insensitive string equality

Pavol Lisy pavol.lisy at gmail.com
Thu Aug 31 14:36:46 EDT 2017


On 8/31/17, Steve D'Aprano <steve+python at pearwood.info> wrote:


>> Additionally: a proper "case insensitive comparison" should almost
>> certainly start with a Unicode normalization. But should it be NFC/NFD
>> or NFKC/NFKD? IMO that's a good reason to leave it in the hands of the
>> application.
>
> Normalisation is orthogonal to comparisons and searches. Python doesn't
> automatically normalise strings, as people have pointed out a bazillion
> times
> in the past, and it happily compares
>
> 'ö' LATIN SMALL LETTER O WITH DIAERESIS
>
> 'ö' LATIN SMALL LETTER O + COMBINING DIAERESIS
>
>
> as unequal. I don't propose to change that just so that we can get 'a'
> equals 'A' :-)

Locale-dependent Case Mappings. The principal example of a case
mapping that depends
on the locale is Turkish, where U+0131 “ı” latin small letter dotless i maps to
U+0049 “I” latin capital letter i and U+0069 “i” latin small letter i maps to
U+0130 “İ” latin capital letter i with dot above. (source:
http://www.unicode.org/versions/Unicode10.0.0/ch05.pdf)

So 'SIKISIN'.casefold() could be dangerous ->
https://translate.google.com/#tr/en/sikisin%0As%C4%B1k%C4%B1s%C4%B1n
(although I am not sure if this story is true ->
https://www.theinquirer.net/inquirer/news/1017243/cellphone-localisation-glitch
)



More information about the Python-list mailing list