[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

Martin v. Löwis report at bugs.python.org
Fri Sep 30 12:36:39 CEST 2011


Martin v. Löwis <martin at v.loewis.de> added the comment:

> Martin, do you think that str.title() should follow the Unicode standard?

I don't think that "follow the Unicode standard" has any meaning in this
context: the Unicode standard doesn't specify (AFAIK) what a .title()
method in a programming language should do.

> Should string methods work with all the normalizations or just with NFC?

When we know what .title() should do, it should do so correctly for all
strings. I try to propose a definition for .title()

"Split S into words. Change the first letter in a word to upper-case,
and all subsequent letters to lower case. A word is a sequence that
starts with a letter, followed by letter-related characters."

Letters are all characters from the "Alphabetic" category, i.e.
Lu+Ll+Lt+Lm+Lo+Nl + Other_Alphabetic.

"letter-related" characters are letters + marks (Mn, Mc, Me).

----------
title: str.title() is overzealous by upcasing combining marks inappropriately -> str.title() is overzealous by upcasing combining marks	inappropriately

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12737>
_______________________________________


More information about the Python-bugs-list mailing list