Case-insensitive string equality

Stephan Houben stephanh42 at gmail.com.invalid
Sun Sep 3 03:17:44 EDT 2017


Op 2017-09-02, Pavol Lisy schreef <pavol.lisy at gmail.com>:
> But problem is that if somebody like to have stable API it has to be
> changed to "do what the Unicode consortium said (at X.Y. ZZZZ)" :/

It is even more exciting. Presumably a reason to have case-insentivity
is to be compatible with existing popular case-insentive systems.

So here is, for your entertainment, how some of these systems work.

* Windows NTFS case-insensitive file system

  A NTFS file system contains a hidden table $UpCase which maps
  characters to their upcase variant. Note:

  1. This maps characters in the BMP *only*, so NTFS treats
     characters outside the BMP as case-insensitive.
  2. Every character can in turn only be mapped into a single
     BMP character, so ß -> SS is not possible.
  3. The table is in practice dependent on the language of the
     Windows system which created it (so a Turkish NTFS partition
     would contain i -> İ), but in theory can contain any allowed
     mapping: I can create an NTFS filesystem which maps a -> b.
  4. Older Windows versions generated tables which
     were broken for certain languages (NT 3.51/Georgian). 
     You may still have some NTFS partition with such a table lying
     around.

* macOS case-insensitive file system

  1. HFS+ is based on Unicode 3.2; this is fixed in stone because of
     backward compatibility.
  2. APFS is based on Unicode 9.0 and does normalization as well

Generally speaking, the more you learn about case normalization,
the more attractive case sensitivity looks ;-)
Also slim hope to have a single caseInsensitiveCompare function
which "Does The Right Thing"™.

Stephan



More information about the Python-list mailing list