Windows NT shell + extended characters. Bug or what?

Jason Orendorff jason at jorendorff.com
Fri Mar 15 13:15:06 EST 2002


Noel Smith wrote:
> >>> print ord('£')
> 156
> >>> print ord('¥')
> 157
>
> Which is clearly wrong although strangely enough when I type the same
> thing into IDLE on Windows NT, it produces the correct results.

Of course, you can use a Latin-1 chart
  http://czyborra.com/charsets/iso8859.html#ISO-8859-1
but that doesn't really answer your question.

When you type the pound sign on your keyboard, the console
interprets it as some byte.  If the console were Latin-1
friendly, it would generate the byte A3, which is the Latin-1
code for the pound sign.

Alas, the PC console uses a different character set called CP437,
which is not Latin-1 compatible.  In CP437, the pound sign
is 156 and the yen sign is 157.

So when you type a pound sign, your computer is putting the
byte 9C (=156) into memory somewhere, and that is what Python
receives when you hit Enter.

Anyway, to cut a long story short:

  (1) Whenever you're using non-ascii characters, you want to use
      Unicode strings.  A bare string like '£' may cause a warning
      in a future version of Python, in part because of this mess.

  (2) Alas, Python doesn't currently interpret Unicode literal
      strings all that intelligently, in the Windows console
      world anyway:

        [in Win2K console]
        >>> ord(u'£')
        156    # uhhh, no
        >>> ord(u'\N{POUND SIGN}')
        163    # correct answer

      I think it's fair to say this is a deficiency in current
      versions of Python.  I believe it will be fixed, so eventually
      ord(u'£') will consistently return 163.  But for now, a
      Latin-1 chart is your best bet.

      (Note that the above are technically Unicode codepoints, not
      Latin-1 character codes.  But Latin-1 is the first page of
      Unicode, so it works.)

  (3) Tk is Unicode-friendly, so when you type or paste a £
      character into it, the correct Unicode character is stored.

Hope this helps allay the confusion, at least.  :-/

## Jason Orendorff    http://www.jorendorff.com/





More information about the Python-list mailing list