Case-insensitive string equality

Pete Forman petef4+usenet at gmail.com
Fri Sep 1 02:18:32 EDT 2017


Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:

> Three times in the last week the devs where I work accidentally
> introduced bugs into our code because of a mistake with case-insensitive
> string comparisons. They managed to demonstrate three different failures:
>
> # 1
> a = something().upper()  # normalise string
> ... much later on
> if a == b.lower(): ...
>
>
> # 2
> a = something().upper()
> ... much later on
> if a == 'maildir': ...
>
>
> # 3
> a = something()  # unnormalised
> assert 'foo' in a
> ... much later on
> pos = a.find('FOO')
>
>
>
> Not every two line function needs to be in the standard library, but I've
> come to the conclusion that case-insensitive testing and searches should
> be. I've made these mistakes myself at times, as I'm sure most people
> have, and I'm tired of writing my own case-insensitive function over and
> over again.
>
>
> So I'd like to propose some additions to 3.7 or 3.8. If the feedback here
> is positive, I'll take it to Python-Ideas for the negative feedback :-)
>
>
> (1) Add a new string method, which performs a case-insensitive equality
> test. Here is a potential implementation, written in pure Python:
>
>
> def equal(self, other):
>     if self is other:
>         return True
>     if not isinstance(other, str):
>         raise TypeError
>     if len(self) != len(other):
>         return False
>     casefold = str.casefold
>     for a, b in zip(self, other):
>         if casefold(a) != casefold(b):
>             return False
>     return True
>
> Alternatively: how about a === triple-equals operator to do the same
> thing?
>
>
>
> (2) Add keyword-only arguments to str.find and str.index:
>
>     casefold=False
>
>     which does nothing if false (the default), and switches to a case-
>     insensitive search if true.
>
>
>
>
> Alternatives:
>
> (i) Do nothing. The status quo wins a stalemate.
>
> (ii) Instead of str.find or index, use a regular expression.
>
> This is less discoverable (you need to know regular expressions) and
> harder to get right than to just call a string method. Also, I expect
> that invoking the re engine just for case insensitivity will be a lot
> more expensive than a simple search need be.
>
> (iii) Not every two line function needs to be in the standard library.
> Just add this to the top of every module:
>
> def equal(s, t):
>     return s.casefold() == t.casefold()
>
>
> That's the status quo wins again. It's an annoyance. A small
> annoyance, but multiplied by the sheer number of times it happens, it
> becomes a large annoyance. I believe the annoyance factor of
> case-insensitive comparisons outweighs the "two line function"
> objection.
>
> And the two-line "equal" function doesn't solve the problem for find
> and index, or for sets dicts, list.index and the `in` operator either.
>
>
> Unsolved problems:
>
> This proposal doesn't help with sets and dicts, list.index and the `in`
> operator either.
>
>
>
> Thoughts?

This seems to me to be rather similar to sort() and sorted(). How about
giving equals() an optional parameter key, and perhaps the older cmp?
Using casefold or upper or lower would satisfy many use cases but also
allow Unicode or more locale specific normalization to be applied.

The shortcircuiting in a character based comparison holds little appeal
for me. I generally find that a string is a more useful concept than a
collection of characters.

+1 for using an affix in the name to represent a normalized version of
the input.

-- 
Pete Forman



More information about the Python-list mailing list