unicode bug in turkish characters?

Wed Apr 2 16:40:16 EST 2003

"Oktay Safak" <oktaysafak at ixir.com> writes:

> When I try to convert the character "i" to uppercase what comes
> out is "I" where it should have a dot on top of it instead. 

How do you do convert to upper-case? If you use

>>> u"i".upper()
u'I'

then this uses the Unicode character database, which is
language-independent, and the upper-case variant of "i" is "I".

However, if you do

>>> locale.setlocale(locale.LC_ALL,"tr_TR")
'tr_TR'
>>> "i".upper()
'\xdd'
>>> "i".upper().decode("iso-8859-9")
u'\u0130'
>>> unicodedata.name(_)
'LATIN CAPITAL LETTER I WITH DOT ABOVE'

then this generates the upper-case version according to the
locale. Notice that this works for byte strings only.

> Also, when I try to convert the uppercase i with dot to lowercase,
> it comes out as itself where "i" should be the character produced.

I can't reproduce this:

>>> u'\u0130'.lower()
u'i'

seems to work fine.

> I might be missing something but I have a feeling that this is
> a bug since the case toggle works perfectly with turkish
> characters that do not exist in ascii. With i and I though, which
> do exist in ascii, it's all messed up.

I'm sure you are missing something. Since you did not say what you did
*exactly* (i.e. which functions you've used with what parameters), it
is hard to tell what your problem is.

However, I'm almost certain that Python has no bug here.

Regards,
Martin