[issue4610] Unicode case mappings are incorrect
Marc-Andre Lemburg
report at bugs.python.org
Wed Oct 14 22:16:29 CEST 2009
Marc-Andre Lemburg <mal at egenix.com> added the comment:
Jeff Senn wrote:
>
> Jeff Senn <senn at users.sourceforge.net> added the comment:
>
> Yikes! I just noticed that u''.title() is really broken!
>
> It doesn't really pay attention to word breaks --
> only characters that "have case".
> Therefore when there are (caseless)
> combining characters in a word it's really broken e.g.
>
>>>> u'n\u0303on\u0303e'.title()
> u'N\u0303On\u0303E'
>
> That is (where '~' is combining-tilde-over)
> n~on~e -title-cases-to-> N~On~E
Please have a look at http://bugs.python.org/issue6412 - that patch
addresses many casing issues, at least up the extent that we can
actually fix them without breaking code relying on:
len(s.upper()) == len(s)
for upper/lower/title.
If we add support for 1-n code point mappings, then we can only
enable this support by using an option to the casing methods (perhaps
not a bad idea: the parameter could be used to signal the local
to assume).
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4610>
_______________________________________
More information about the Python-bugs-list
mailing list