Case-insensitive string equality

Steve D'Aprano steve+python at pearwood.info
Fri Sep 1 20:31:28 EDT 2017


On Sat, 2 Sep 2017 01:41 am, Chris Angelico wrote:

> Aside from lower(), which returns the string unchanged, the case
> conversion rules say that this contains two letters.

Do you have a reference to that?

I mean, where in the Unicode case conversion rules is that stated? You cannot
take the behaviour of Python as necessarily correct here -- it may be that the
behaviour of Python is erroneous.


For what its worth, even under Unicode's own rules, there are always going to be
odd corner cases that surprise people. The most obvious cases are:


- dotted and dottless i

- the German eszett, ß, which has two official[1] uppercase forms: 'SS' and an
uppercase eszett

- long s, ſ, which may or may not be treated as distinct from s

- likewise for ligatures -- is æ a ligature, or is it Old English ash?


You can't keep everybody happy. Doesn't mean we can't meet 99% of the usescases.

After all, what do you think the regex case insensitive matching does?




[1] I believe that the German government has now officially recognised the
uppercase form of ß.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list