unicode bug in turkish characters?

Martin v. Löwis martin at v.loewis.de
Wed Apr 2 16:40:16 EST 2003


"Oktay Safak" <oktaysafak at ixir.com> writes:

> When I try to convert the character "i" to uppercase what comes
> out is "I" where it should have a dot on top of it instead. 

How do you do convert to upper-case? If you use

>>> u"i".upper()
u'I'

then this uses the Unicode character database, which is
language-independent, and the upper-case variant of "i" is "I".

However, if you do

>>> locale.setlocale(locale.LC_ALL,"tr_TR")
'tr_TR'
>>> "i".upper()
'\xdd'
>>> "i".upper().decode("iso-8859-9")
u'\u0130'
>>> unicodedata.name(_)
'LATIN CAPITAL LETTER I WITH DOT ABOVE'

then this generates the upper-case version according to the
locale. Notice that this works for byte strings only.

> Also, when I try to convert the uppercase i with dot to lowercase,
> it comes out as itself where "i" should be the character produced.

I can't reproduce this:

>>> u'\u0130'.lower()
u'i'

seems to work fine.

> I might be missing something but I have a feeling that this is
> a bug since the case toggle works perfectly with turkish
> characters that do not exist in ascii. With i and I though, which
> do exist in ascii, it's all messed up.

I'm sure you are missing something. Since you did not say what you did
*exactly* (i.e. which functions you've used with what parameters), it
is hard to tell what your problem is.

However, I'm almost certain that Python has no bug here.

Regards,
Martin




More information about the Python-list mailing list