raw_input() and utf-8 formatted chars

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Sat Oct 13 04:38:19 EDT 2007


On Fri, 12 Oct 2007 19:09:46 -0700, 7stud wrote:

> On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
>> You mean literally!?  Then of course I get A\xcc\x88 because that's what I
>> entered.  In string literals in source code the backslash has a special
>> meaning but `raw_input()` does not "interpret" the input in any way.
>>
> 
> Then why don't I end up with the same situation as this:
> 
>> >> > s = 'A\xcc\x88'   #capital A with umlaut
>> >> > print s           #displays capital A with umlaut

I don't get the question!?  In string literals in source code the
backslash has a special meaning, like I wrote above.  When Python compiles
that above snippet you end up with a string of three bytes, one with the
ASCII value of an 'A' and two bytes where you typed in the byte value in
hexadecimal:

In [191]: s = 'A\xcc\x88'

In [192]: len(s)
Out[192]: 3

In [193]: map(ord, s)
Out[193]: [65, 204, 136]

In [194]: print s
Ä

The last works this way only if the receiving/displaying program expected
UTF-8 as encoding.  Otherwise something other than an Ä would have been
shown.

If you type in that text when asked by `raw_input()` then you get exactly
what you typed because there is no Python source code compiled:

In [195]: s = raw_input()
A\xcc\x88

In [196]: len(s)
Out[196]: 9

In [197]: map(ord, s)
Out[197]: [65, 92, 120, 99, 99, 92, 120, 56, 56]

In [198]: print s
A\xcc\x88

>> > And what is it that your keyboard enters to produce an 'a' with an
>> > umlaut?
>>
>> *I* just hit the  key.  The one right next to the ö key.  ;-)
>>
> ...and what if you don't have an a-with-umlaut key?

I find other means to enter it.  <Alt> + some magic number on the numeric
keypad in windows, or <Compose>, <a>, <"> on Unix/Linux.  Some text editors
offer special sequences too.  If all fails there are character map
programs that show all unicode characters to choose from and copy'n'paste
them.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list