Case-insensitive string equality

Chris Angelico rosuav at gmail.com
Thu Aug 31 10:03:01 EDT 2017


On Thu, Aug 31, 2017 at 11:53 PM, Stefan Ram <ram at zedat.fu-berlin.de> wrote:
> Chris Angelico <rosuav at gmail.com> writes:
>>On Thu, Aug 31, 2017 at 10:49 PM, Steve D'Aprano
>><steve+python at pearwood.info> wrote:
>>> On Thu, 31 Aug 2017 05:51 pm, Serhiy Storchaka wrote:
>>>> 31.08.17 10:10, Steven D'Aprano ???:
>>>>> def equal(s, t):
>>>>>      return s.casefold() == t.casefold()
>>The method you proposed seems a little odd - it steps through the
>>strings character by character and casefolds them separately. How is
>>it superior to the two-line function?
>
>   When the strings are long, casefolding both strings
>   just to be able to tell that the first character of
>   the left string is »;« while the first character of
>   the right string is »'« and so the result is »False«
>   might be slower than necessary.
> [chomp]
>   However, premature optimization is the root of all evil!

Fair enough.

However, I'm more concerned about the possibility of a semantic
difference between the two. Is it at all possible for the case folding
of an entire string to differ from the concatenation of the case
foldings of its individual characters?

Additionally: a proper "case insensitive comparison" should almost
certainly start with a Unicode normalization. But should it be NFC/NFD
or NFKC/NFKD? IMO that's a good reason to leave it in the hands of the
application.

ChrisA



More information about the Python-list mailing list