Incorrect title case?

Terry Reedy tjreedy at udel.edu
Sat Jan 17 17:14:38 EST 2009


John Machin wrote:
> On Jan 17, 9:07 am, MRAB <goo... at mrabarnett.plus.com> wrote:
>> Python 2.6.1
>>
>> I've just found that the following 4 Unicode characters/codepoints don't
>> behave as I'd expect: Dž (U+01C5), Lj (U+01C8), Nj (U+01CB), Dz (U+01F2).
>>
>> For example, u"\u01C5".istitle() returns True and
>> unicodedata.category(u"\u01C5") returns "Lt", but u"\u01C5".title()
>> returns u'\u01C4', which is the uppercase equivalent. Are these mistakes
>> in the Unicode database?
> 
> Doesn't look like it. AFAICT it's a mistake in Objects/unicodetype.c,
> function _PyUnicode_ToTitlecase.
> 
> See http://svn.python.org/view/python/trunk/Objects/unicodectype.c?rev=66362&view=markup
> 
> The code that says:
>     if (ctype->title)
>         delta = ctype->title;
>     else
> 	delta = ctype->upper;
> should IMHO merely be:
>     delta = ctype->title;
> 
> A value of zero for ctype->title should be interpreted simply as the
> offset to add to the ordinal, as it is in the sibling _PyUnicode_To
> (Upper|Lower)case functions. See also Tools/unicode/makeunicodedata.py
> which treats upper, lower and title identically when preparing the
> tables used by those 3 functions.
> 
> AFAICT making that change will fix the problem for those four
> characters and not ruin any others.
> 
> The error that you noticed occurs as far back as I've looked (2.1) and
> also occurs in 3.0.

Please post a report to the tracker at bugs.python.org.




More information about the Python-list mailing list