[issue4610] Unicode case mappings are incorrect

Marc-Andre Lemburg report at bugs.python.org
Wed Dec 10 10:44:11 CET 2008


Marc-Andre Lemburg <mal at egenix.com> added the comment:

Python uses the Unicode database for the mapping and this only contains
1-1 mappings. The special cases (mostly 1-2 mappings) are not included.

It would be nice to have them available as well, but I guess we'd have
to write them in code rather than invent a new mapping table for them.

Furthermore, there are a few cases like e.g. the Turkish i where case
mappings depend on external context such as the language the code point
is used in - those cases are difficult to get right.

We may need to extend the .lower()/.upper()/.title() methods with an
optional parameter that allow providing this extra context information
to the methods.

BTW: 'ß' is being phased out in German. The new writing rules encourage
using 'ss' or 'SS' instead (which is not entirely correct, since 'ß'
originated from 'sz' used some hundred or so years ago, but those are
just details ;-).

----------
nosy: +lemburg

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4610>
_______________________________________


More information about the Python-bugs-list mailing list