raw_input() and utf-8 formatted chars

MRAB google at mrabarnett.plus.com
Sat Oct 13 14:42:41 EDT 2007


On Oct 13, 3:09 am, 7stud <bbxx789_0... at yahoo.com> wrote:
> On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
>
> > You mean literally!?  Then of course I get A\xcc\x88 because that's what I
> > entered.  In string literals in source code the backslash has a special
> > meaning but `raw_input()` does not "interpret" the input in any way.
>
> Then why don't I end up with the same situation as this:
>
> > >> > s = 'A\xcc\x88'   #capital A with umlaut
> > >> > print s           #displays capital A with umlaut
> > > And what is it that your keyboard enters to produce an 'a' with an umlaut?
>
> > *I* just hit the ä key.  The one right next to the ö key.  ;-)
>
> ...and what if you don't have an a-with-umlaut key?

raw_input() returns the string exactly as you entered it. You can
decode that into the actual UTF-8 string with decode("string_escape"):

s = raw_input('Enter: ')   #A\xcc\x88
s = s.decode("string_escape")

It looks like your system already understands UTF-8 and will decode
the UTF-8 string you print to the Unicode character.




More information about the Python-list mailing list