Can upper() or lower() ever change the length of a string?

Terry Reedy tjreedy at udel.edu
Mon May 24 14:01:59 EDT 2010


On 5/24/2010 10:42 AM, MRAB wrote:
> Mark Dickinson wrote:

>> Digging a bit deeper, it looks like these methods are using the
>> Simple_{Upper,Lower,Title}case_Mapping functions described at
>> http://www.unicode.org/Public/5.1.0/ucd/UCD.html fields 12, 13 and 14
>> of the unicode data; you can see this in the source in Tools/unicode/
>> makeunicodedata.py, which is the Python code that generates the
>> database of unicode properties. It contains code like:
>>
>> if record[12]:
>> upper = int(record[12], 16)
>> else:
>> upper = char
>> if record[13]:
>> lower = int(record[13], 16)
>> else:
>> lower = char
>> if record[14]:
>> title = int(record[14], 16)
>>
>> ... and so on.
>>
>> I agree that it might be desirable for these operations to product the
>> multicharacter equivalents. That idea looks like a tough sell,
>> though: apart from backwards compatibility concerns (which could
>> probably be worked around somehow), it looks as though it would
>> require significant effort to implement.
>>
> If we were to make such a change, I think we should also cater for
> locale-specific case changes (passing the locale to 'upper', 'lower' and
> 'title').
>
> For example, normally "i".upper() returns "I", but in Turkish
> "i".upper() should return "İ" (the uppercase version of lowercase dotted
> i is uppercase dotted I).

Given that the current (siimple) functions implement standard-defined 
functions, I think any change should be to *add* new 
'complex-case-change' functions.

Terry Jan Reedy







More information about the Python-list mailing list