[issue12737] str.title() is overzealous by upcasing combining marks inappropriately
Martin v. Löwis
report at bugs.python.org
Fri Sep 30 12:36:39 CEST 2011
Martin v. Löwis <martin at v.loewis.de> added the comment:
> Martin, do you think that str.title() should follow the Unicode standard?
I don't think that "follow the Unicode standard" has any meaning in this
context: the Unicode standard doesn't specify (AFAIK) what a .title()
method in a programming language should do.
> Should string methods work with all the normalizations or just with NFC?
When we know what .title() should do, it should do so correctly for all
strings. I try to propose a definition for .title()
"Split S into words. Change the first letter in a word to upper-case,
and all subsequent letters to lower case. A word is a sequence that
starts with a letter, followed by letter-related characters."
Letters are all characters from the "Alphabetic" category, i.e.
Lu+Ll+Lt+Lm+Lo+Nl + Other_Alphabetic.
"letter-related" characters are letters + marks (Mn, Mc, Me).
----------
title: str.title() is overzealous by upcasing combining marks inappropriately -> str.title() is overzealous by upcasing combining marks inappropriately
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12737>
_______________________________________
More information about the Python-bugs-list
mailing list