Case-insensitive string equality

Steven D'Aprano steve+comp.lang.python at pearwood.info
Thu Aug 31 03:10:10 EDT 2017


Three times in the last week the devs where I work accidentally 
introduced bugs into our code because of a mistake with case-insensitive 
string comparisons. They managed to demonstrate three different failures:

# 1
a = something().upper()  # normalise string
... much later on
if a == b.lower(): ...


# 2
a = something().upper()
... much later on
if a == 'maildir': ...


# 3
a = something()  # unnormalised
assert 'foo' in a
... much later on
pos = a.find('FOO')



Not every two line function needs to be in the standard library, but I've 
come to the conclusion that case-insensitive testing and searches should 
be. I've made these mistakes myself at times, as I'm sure most people 
have, and I'm tired of writing my own case-insensitive function over and 
over again.


So I'd like to propose some additions to 3.7 or 3.8. If the feedback here 
is positive, I'll take it to Python-Ideas for the negative feedback :-)


(1) Add a new string method, which performs a case-insensitive equality 
test. Here is a potential implementation, written in pure Python:


def equal(self, other):
    if self is other:
        return True
    if not isinstance(other, str):
        raise TypeError
    if len(self) != len(other):
        return False
    casefold = str.casefold
    for a, b in zip(self, other):
        if casefold(a) != casefold(b):
            return False
    return True

Alternatively: how about a === triple-equals operator to do the same 
thing?



(2) Add keyword-only arguments to str.find and str.index:

    casefold=False

    which does nothing if false (the default), and switches to a case-
    insensitive search if true.




Alternatives:

(i) Do nothing. The status quo wins a stalemate.

(ii) Instead of str.find or index, use a regular expression.

This is less discoverable (you need to know regular expressions) and 
harder to get right than to just call a string method. Also, I expect 
that invoking the re engine just for case insensitivity will be a lot 
more expensive than a simple search need be.

(iii) Not every two line function needs to be in the standard library. 
Just add this to the top of every module:

def equal(s, t):
    return s.casefold() == t.casefold()


That's the status quo wins again. It's an annoyance. A small annoyance, 
but multiplied by the sheer number of times it happens, it becomes a 
large annoyance. I believe the annoyance factor of case-insensitive 
comparisons outweighs the "two line function" objection.

And the two-line "equal" function doesn't solve the problem for find and 
index, or for sets dicts, list.index and the `in` operator either.


Unsolved problems:

This proposal doesn't help with sets and dicts, list.index and the `in` 
operator either.



Thoughts?



-- 
Steven D'Aprano
“You are deluded if you think software engineers who can't write 
operating systems or applications without security holes, can write 
virtualization layers without security holes.” —Theo de Raadt



More information about the Python-list mailing list