Windows NT shell + extended characters. Bug or what?
Jason Orendorff
jason at jorendorff.com
Fri Mar 15 13:15:06 EST 2002
Noel Smith wrote:
> >>> print ord('£')
> 156
> >>> print ord('¥')
> 157
>
> Which is clearly wrong although strangely enough when I type the same
> thing into IDLE on Windows NT, it produces the correct results.
Of course, you can use a Latin-1 chart
http://czyborra.com/charsets/iso8859.html#ISO-8859-1
but that doesn't really answer your question.
When you type the pound sign on your keyboard, the console
interprets it as some byte. If the console were Latin-1
friendly, it would generate the byte A3, which is the Latin-1
code for the pound sign.
Alas, the PC console uses a different character set called CP437,
which is not Latin-1 compatible. In CP437, the pound sign
is 156 and the yen sign is 157.
So when you type a pound sign, your computer is putting the
byte 9C (=156) into memory somewhere, and that is what Python
receives when you hit Enter.
Anyway, to cut a long story short:
(1) Whenever you're using non-ascii characters, you want to use
Unicode strings. A bare string like '£' may cause a warning
in a future version of Python, in part because of this mess.
(2) Alas, Python doesn't currently interpret Unicode literal
strings all that intelligently, in the Windows console
world anyway:
[in Win2K console]
>>> ord(u'£')
156 # uhhh, no
>>> ord(u'\N{POUND SIGN}')
163 # correct answer
I think it's fair to say this is a deficiency in current
versions of Python. I believe it will be fixed, so eventually
ord(u'£') will consistently return 163. But for now, a
Latin-1 chart is your best bet.
(Note that the above are technically Unicode codepoints, not
Latin-1 character codes. But Latin-1 is the first page of
Unicode, so it works.)
(3) Tk is Unicode-friendly, so when you type or paste a £
character into it, the correct Unicode character is stored.
Hope this helps allay the confusion, at least. :-/
## Jason Orendorff http://www.jorendorff.com/
More information about the Python-list
mailing list