Case-insensitive string equality
Stephan Houben
stephanh42 at gmail.com.invalid
Sun Sep 3 03:17:44 EDT 2017
Op 2017-09-02, Pavol Lisy schreef <pavol.lisy at gmail.com>:
> But problem is that if somebody like to have stable API it has to be
> changed to "do what the Unicode consortium said (at X.Y. ZZZZ)" :/
It is even more exciting. Presumably a reason to have case-insentivity
is to be compatible with existing popular case-insentive systems.
So here is, for your entertainment, how some of these systems work.
* Windows NTFS case-insensitive file system
A NTFS file system contains a hidden table $UpCase which maps
characters to their upcase variant. Note:
1. This maps characters in the BMP *only*, so NTFS treats
characters outside the BMP as case-insensitive.
2. Every character can in turn only be mapped into a single
BMP character, so ß -> SS is not possible.
3. The table is in practice dependent on the language of the
Windows system which created it (so a Turkish NTFS partition
would contain i -> İ), but in theory can contain any allowed
mapping: I can create an NTFS filesystem which maps a -> b.
4. Older Windows versions generated tables which
were broken for certain languages (NT 3.51/Georgian).
You may still have some NTFS partition with such a table lying
around.
* macOS case-insensitive file system
1. HFS+ is based on Unicode 3.2; this is fixed in stone because of
backward compatibility.
2. APFS is based on Unicode 9.0 and does normalization as well
Generally speaking, the more you learn about case normalization,
the more attractive case sensitivity looks ;-)
Also slim hope to have a single caseInsensitiveCompare function
which "Does The Right Thing"™.
Stephan
More information about the Python-list
mailing list