[issue4610] Unicode case mappings are incorrect

Marc-Andre Lemburg report at bugs.python.org
Wed Oct 14 22:16:29 CEST 2009


Marc-Andre Lemburg <mal at egenix.com> added the comment:

Jeff Senn wrote:
> 
> Jeff Senn <senn at users.sourceforge.net> added the comment:
> 
> Yikes! I just noticed that u''.title() is really broken! 
> 
> It doesn't really pay attention to word breaks -- 
> only characters that "have case".  
> Therefore when there are (caseless)
> combining characters in a word it's really broken e.g.
> 
>>>> u'n\u0303on\u0303e'.title()
> u'N\u0303On\u0303E'
> 
> That is (where '~' is combining-tilde-over)
> n~on~e -title-cases-to-> N~On~E

Please have a look at http://bugs.python.org/issue6412 - that patch
addresses many casing issues, at least up the extent that we can
actually fix them without breaking code relying on:

len(s.upper()) == len(s)

for upper/lower/title.

If we add support for 1-n code point mappings, then we can only
enable this support by using an option to the casing methods (perhaps
not a bad idea: the parameter could be used to signal the local
to assume).

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4610>
_______________________________________


More information about the Python-bugs-list mailing list