iso_8859_1 mystery/tkinter

Jeff Epler jepler at unpythonic.net
Wed May 18 19:02:00 EDT 2005


this isn't about the "sign bit", it's about assumed encodings for byte
strings..

In iso_8859_1 and unicode, the character with value 0xb0 is DEGREE SIGN.
In other character sets, that may not be true---For instance, in the
Windows "code page 437", it is u'\u2591' aka LIGHT SHADE (a half-tone pattern).

When you write code like
    x = '%c' % (0xb0)
and then pass x to a Tkinter call, Tkinter treats it as a string encoded
in some system-default encoding, which could give DEGREE SIGN, could
give LIGHT SHADE, or could give other characters (a thai user of Windows
might see THAI CHARACTER THO THAN, for instance, and I would see a
question mark because I use utf-8 and this is an invalid byte sequence).

By using
    x = u'%c' % (0xb0)
you get a unicode string, and there is no confusion about the meaning of
the symbol---you always get DEGREE SIGN.

Jeff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20050518/47433ba2/attachment.sig>


More information about the Python-list mailing list