raw_input() and utf-8 formatted chars
MRAB
google at mrabarnett.plus.com
Sat Oct 13 14:42:41 EDT 2007
On Oct 13, 3:09 am, 7stud <bbxx789_0... at yahoo.com> wrote:
> On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
>
> > You mean literally!? Then of course I get A\xcc\x88 because that's what I
> > entered. In string literals in source code the backslash has a special
> > meaning but `raw_input()` does not "interpret" the input in any way.
>
> Then why don't I end up with the same situation as this:
>
> > >> > s = 'A\xcc\x88' #capital A with umlaut
> > >> > print s #displays capital A with umlaut
> > > And what is it that your keyboard enters to produce an 'a' with an umlaut?
>
> > *I* just hit the ä key. The one right next to the ö key. ;-)
>
> ...and what if you don't have an a-with-umlaut key?
raw_input() returns the string exactly as you entered it. You can
decode that into the actual UTF-8 string with decode("string_escape"):
s = raw_input('Enter: ') #A\xcc\x88
s = s.decode("string_escape")
It looks like your system already understands UTF-8 and will decode
the UTF-8 string you print to the Unicode character.
More information about the Python-list
mailing list