Case-insensitive sorting of strings (Python newbie)

Marko Rauhamaa marko at pacujo.net
Fri Jan 23 14:14:15 EST 2015


Peter Otten <__peter__ at web.de>:

> The standard recommendation is to convert bytes to unicode as early as
> possible and only manipulate unicode.

Unicode doesn't get you off the hook (as you explain later in your
post). Upper/lowercase as well as collation order is ambiguous. Python
even with decent locale support can't be expected to do it all for you.

Well, if Python can't, then who can? Probably nobody in the world, not
generically, anyway.

Example:

    >>> print("re\u0301sume\u0301")
    résumé
    >>> print("r\u00e9sum\u00e9")
    résumé
    >>> print("re\u0301sume\u0301" == "r\u00e9sum\u00e9")
    False
    >>> print("\ufb01nd")
    find
    >>> print("find")
    find
    >>> print("\ufb01nd" == "find")
    False

If equality can't be determined, words really can't be sorted.


Marko



More information about the Python-list mailing list