unicode bug in turkish characters?

oktaysafak at ixir.com oktaysafak at ixir.com
Thu Apr 3 15:22:57 EST 2003


Martin, thanks for your attention.

Here is the problematic output on my machine (turkish windows98, IDLE):

Python 2.3a2 (#39, Feb 19 2003, 17:58:58) [MSC v.1200 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help

>>> import sys
>>> sys.getdefaultencoding()
'iso-8859-9'
>>> import locale
>>> locale.getdefaultlocale()
('tr_TR', 'cp1254')
>>> u"i".upper()
u'I'
>>> "i".upper()
'I'
>>> "i".encode("windows-1254").upper()
'I'
>>> "i".encode("iso-8859-9").upper()
'I'
>>> u"i".encode("iso-8859-9").upper()
'I'
>>> "i".upper().decode("iso-8859-9")
u'I'
>>> import unicodedata
>>> unicodedata.name(_)
'LATIN CAPITAL LETTER I'
>>> hex(ord("i".upper()))
'0x49'    # necessary to see the hex value in my IDLE

You see, the values we get are different. However, if I do

>>> hex(ord("*"))  # * stands for capial I with dot
'0xdd'



I get the correct value. To see whether things would improve if the locale sys.defaultencoding are both windows1254, since I'm on a windows machine, here is another run:



Python 2.3a2 (#39, Feb 19 2003, 17:58:58) [MSC v.1200 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help
>>> import sys, unicodedata, locale
>>> sys.getdefaultencoding()
'windows-1254'
>>> locale.getdefaultlocale()
('tr_TR', 'cp1254')
>>> locale.setlocale(locale.LC_ALL, "tr_tr")
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in ?
    locale.setlocale(locale.LC_ALL, "tr_tr")
  File "C:\PYTHON23\lib\locale.py", line 381, in setlocale
    return _setlocale(category, locale)
Error: locale setting not supported
>>> locale.setlocale(locale.LC_ALL, "turkish")
'Turkish_Turkey.1254'
>>> locale.getdefaultlocale()
('tr_TR', 'cp1254')
>>> "i".upper()
'I'
>>> u"i".upper()
u'I'
>>> locale._print_locale()
Locale defaults as determined by getdefaultlocale():
------------------------------------------------------------------------
Language:  tr_TR
Encoding:  cp1254

Locale settings on startup:
------------------------------------------------------------------------
LC_NUMERIC ...
   Language:  Turkish_Turkey
   Encoding:  1254

LC_MONETARY ...
   Language:  Turkish_Turkey
   Encoding:  1254

LC_TIME ...
   Language:  Turkish_Turkey
   Encoding:  1254

LC_COLLATE ...
   Language:  Turkish_Turkey
   Encoding:  1254

LC_CTYPE ...
   Language:  Turkish_Turkey
   Encoding:  1254


Locale settings after calling resetlocale():
------------------------------------------------------------------------
Traceback (most recent call last):
  File "<pyshell#8>", line 1, in ?
    locale._print_locale()
  File "C:\PYTHON23\lib\locale.py", line 737, in _print_locale
    resetlocale()
  File "C:\PYTHON23\lib\locale.py", line 391, in resetlocale
    _setlocale(category, _build_localename(getdefaultlocale()))
Error: locale setting not supported
>>> hex(ord("i".upper()))
'0x49'


No change. Also, when I try to set the locale to "tr_TR" which you do painlessly, I get an error instead. The error I get with _print_locale is also interesting. 

I hope these outputs make it clear for you to see what's going on.

Thanks again.

Regards,
Oktay






More information about the Python-list mailing list