raw_input() and utf-8 formatted chars

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Fri Oct 12 16:43:43 EDT 2007


On Fri, 12 Oct 2007 13:18:35 -0700, 7stud wrote:

> On Oct 12, 1:18 pm, kyoso... at gmail.com wrote:
>> On Oct 12, 1:53 pm, 7stud <bbxx789_0... at yahoo.com> wrote:
>>
>> > s = 'A\xcc\x88'   #capital A with umlaut
>> > print s           #displays capital A with umlaut
>>
>> > s = raw_input('Enter: ')   #A\xcc\x88
>> > print s                    #displays A\xcc\x88
>>
>> > print len(input)           #9
>>
>> > It looks like every character of the string I enter in utf-8 is being
>> > interpreted literally as 9 separate characters rather than one
>> > character.  How do I enter a capital A with an umlaut so that python
>> > treats it as one character?
>>
>> I don't know. This works for me:
>>
>>
>>
>> >>> x = raw_input('Enter: ')
>> Enter: 
>> >>> len(x)
>> 1
>>
>> I'm using Python 2.4 with Default Source Encoding set to None on
>> Windows XP SP2.
>>
>> Mike
> 
> Yeah, but what happens when you enter A\xcc\x88?

You mean literally!?  Then of course I get A\xcc\x88 because that's what I
entered.  In string literals in source code the backslash has a special
meaning but `raw_input()` does not "interpret" the input in any way.

> And what is it that your keyboard enters to produce an 'a' with an umlaut?

*I* just hit the ä key.  The one right next to the ö key.  ;-)

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list