Incorrect title case?

MRAB google at mrabarnett.plus.com
Sat Jan 17 17:42:00 EST 2009


Terry Reedy wrote:
> John Machin wrote:
>> On Jan 17, 9:07 am, MRAB <goo... at mrabarnett.plus.com> wrote:
>>> Python 2.6.1
>>>
>>> I've just found that the following 4 Unicode characters/codepoints don't
>>> behave as I'd expect: Dž (U+01C5), Lj (U+01C8), Nj (U+01CB), Dz (U+01F2).
>>>
>>> For example, u"\u01C5".istitle() returns True and
>>> unicodedata.category(u"\u01C5") returns "Lt", but u"\u01C5".title()
>>> returns u'\u01C4', which is the uppercase equivalent. Are these mistakes
>>> in the Unicode database?
>>
>> Doesn't look like it. AFAICT it's a mistake in Objects/unicodetype.c,
>> function _PyUnicode_ToTitlecase.
>>
>> See 
>> http://svn.python.org/view/python/trunk/Objects/unicodectype.c?rev=66362&view=markup 
>>
>>
>> The code that says:
>>     if (ctype->title)
>>         delta = ctype->title;
>>     else
>>     delta = ctype->upper;
>> should IMHO merely be:
>>     delta = ctype->title;
>>
>> A value of zero for ctype->title should be interpreted simply as the
>> offset to add to the ordinal, as it is in the sibling _PyUnicode_To
>> (Upper|Lower)case functions. See also Tools/unicode/makeunicodedata.py
>> which treats upper, lower and title identically when preparing the
>> tables used by those 3 functions.
>>
>> AFAICT making that change will fix the problem for those four
>> characters and not ruin any others.
>>
>> The error that you noticed occurs as far back as I've looked (2.1) and
>> also occurs in 3.0.
> 
> Please post a report to the tracker at bugs.python.org.
> 
Already done: http://bugs.python.org/issue4971



More information about the Python-list mailing list