Case-insensitive string equality

Tim Chase python.list at tim.thechases.com
Thu Aug 31 11:29:29 EDT 2017


On 2017-08-31 07:10, Steven D'Aprano wrote:
> So I'd like to propose some additions to 3.7 or 3.8.

Adding my "yes, a case-insensitive equality-check would be useful"
with the following concerns:

I'd want to have an optional parameter to take locale into
consideration.  E.g.

  "i".case_insensitive_equals("I") # depends on Locale
  "i".case_insensitive_equals("I", Locale("TR")) == False
  "i".case_insensitive_equals("I", Locale("US")) == True

and other oddities like

  "ß".case_insensitive_equals("SS") == True

(though casefold() takes care of that later one).  Then you get
things like

  "III".case_insensitive_equals("\N{ROMAN NUMERAL THREE}")
  "iii".case_insensitive_equals("\N{ROMAN NUMERAL THREE}")
  "FI".case_insensitive_equals("\N{LATIN SMALL LIGATURE FI}")

where the decomposition might need to be considered.  There are just
a lot of odd edge-cases to consider when discussing fuzzy equality.

> (1) Add a new string method,

This is my preferred avenue.

> Alternatively: how about a === triple-equals operator to do the
> same thing?

No.  A strong -1 for new operators.  This peeves me in other
languages (looking at you, PHP & JavaScript)

> (2) Add keyword-only arguments to str.find and str.index:
> 
>     casefold=False
> 
>     which does nothing if false (the default), and switches to a
> case- insensitive search if true.

I'm okay with some means of conveying the insensitivity to
str.find/str.index but have no interest in list.find/list.index
growing similar functionality.  I'm meh on the "casefold=False"
syntax, especially in light of my hope it would take a locale for the
comparisons.

> Unsolved problems:
> 
> This proposal doesn't help with sets and dicts, list.index and the
> `in` operator either.

I'd be less concerned about these.  If you plan to index a set/dict
by the key, normalize it before you put it in.  Or perhaps create a
CaseInsensitiveDict/CaseInsensitiveSet class.  For lists and 'in'
operator usage, it's not too hard to make up a helper function based
on the newly-grown method:

  def case_insensitive_in(itr, target, locale=None):
    return any(
      target.case_insensitive_equals(x, locale)
      for x in itr
      )

  def case_insensitive_index(itr, target, locale=None):
    for i, x in enumerate(itr):
      if target.case_insensitive_equals(x, locale):
        return i
    raise ValueError("Could not find %s" % target)

-tkc











More information about the Python-list mailing list