Case-insensitive string equality

Antoon Pardon antoon.pardon at vub.be
Thu Aug 31 03:44:24 EDT 2017


IMO this should be solved by a company used library and I would
go in the direction of a Normalized_String class.

This has the advantages 
(1) that the company can choose whatever normalization suits them,
    not all cases are suited by comparing case insentitively,
(2) individual devs in the company don't have to write there own.
(3) and Normalized_Strings can be keys in directories and members
    in a set.

Op 31-08-17 om 09:10 schreef Steven D'Aprano:
> Three times in the last week the devs where I work accidentally 
> introduced bugs into our code because of a mistake with case-insensitive 
> string comparisons. They managed to demonstrate three different failures:
>
> # 1
> a = something().upper()  # normalise string
> ... much later on
> if a == b.lower(): ...
>
>
> # 2
> a = something().upper()
> ... much later on
> if a == 'maildir': ...
>
>
> # 3
> a = something()  # unnormalised
> assert 'foo' in a
> ... much later on
> pos = a.find('FOO')
>
>
>
> Not every two line function needs to be in the standard library, but I've 
> come to the conclusion that case-insensitive testing and searches should 
> be. I've made these mistakes myself at times, as I'm sure most people 
> have, and I'm tired of writing my own case-insensitive function over and 
> over again.
>
>
> So I'd like to propose some additions to 3.7 or 3.8. If the feedback here 
> is positive, I'll take it to Python-Ideas for the negative feedback :-)
>
>
> (1) Add a new string method, which performs a case-insensitive equality 
> test. Here is a potential implementation, written in pure Python:
>
>
> def equal(self, other):
>     if self is other:
>         return True
>     if not isinstance(other, str):
>         raise TypeError
>     if len(self) != len(other):
>         return False
>     casefold = str.casefold
>     for a, b in zip(self, other):
>         if casefold(a) != casefold(b):
>             return False
>     return True
>
> Alternatively: how about a === triple-equals operator to do the same 
> thing?
>
>
>
> (2) Add keyword-only arguments to str.find and str.index:
>
>     casefold=False
>
>     which does nothing if false (the default), and switches to a case-
>     insensitive search if true.
>
>
>
>
> Alternatives:
>
> (i) Do nothing. The status quo wins a stalemate.
>
> (ii) Instead of str.find or index, use a regular expression.
>
> This is less discoverable (you need to know regular expressions) and 
> harder to get right than to just call a string method. Also, I expect 
> that invoking the re engine just for case insensitivity will be a lot 
> more expensive than a simple search need be.
>
> (iii) Not every two line function needs to be in the standard library. 
> Just add this to the top of every module:
>
> def equal(s, t):
>     return s.casefold() == t.casefold()
>
>
> That's the status quo wins again. It's an annoyance. A small annoyance, 
> but multiplied by the sheer number of times it happens, it becomes a 
> large annoyance. I believe the annoyance factor of case-insensitive 
> comparisons outweighs the "two line function" objection.
>
> And the two-line "equal" function doesn't solve the problem for find and 
> index, or for sets dicts, list.index and the `in` operator either.
>
>
> Unsolved problems:
>
> This proposal doesn't help with sets and dicts, list.index and the `in` 
> operator either.
>
>
>
> Thoughts?
>
>
>




More information about the Python-list mailing list