Case-insensitive string equality

Rhodri James rhodri at kynesim.co.uk
Thu Aug 31 10:23:19 EDT 2017


On 31/08/17 15:03, Chris Angelico wrote:
> On Thu, Aug 31, 2017 at 11:53 PM, Stefan Ram <ram at zedat.fu-berlin.de> wrote:
>> Chris Angelico <rosuav at gmail.com> writes:
>>> On Thu, Aug 31, 2017 at 10:49 PM, Steve D'Aprano
>>> <steve+python at pearwood.info> wrote:
>>>> On Thu, 31 Aug 2017 05:51 pm, Serhiy Storchaka wrote:
>>>>> 31.08.17 10:10, Steven D'Aprano ???:
>>>>>> def equal(s, t):
>>>>>>       return s.casefold() == t.casefold()
>>> The method you proposed seems a little odd - it steps through the
>>> strings character by character and casefolds them separately. How is
>>> it superior to the two-line function?
>>
>>    When the strings are long, casefolding both strings
>>    just to be able to tell that the first character of
>>    the left string is »;« while the first character of
>>    the right string is »'« and so the result is »False«
>>    might be slower than necessary.
>> [chomp]
>>    However, premature optimization is the root of all evil!
> 
> Fair enough.
> 
> However, I'm more concerned about the possibility of a semantic
> difference between the two. Is it at all possible for the case folding
> of an entire string to differ from the concatenation of the case
> foldings of its individual characters?
> 
> Additionally: a proper "case insensitive comparison" should almost
> certainly start with a Unicode normalization. But should it be NFC/NFD
> or NFKC/NFKD? IMO that's a good reason to leave it in the hands of the
> application.

There's also the example in the documentation of str.casefold to 
consider.  We would rather like str.equal("ß", "ss") to be true.

-- 
Rhodri James *-* Kynesim Ltd



More information about the Python-list mailing list