Python beginner, unicode encode/decode Q

MRAB google at mrabarnett.plus.com
Mon Jul 14 11:44:55 EDT 2008


On Jul 14, 1:51 pm, anonymous <anonym... at anonymous.com> wrote:
> 1 Objective to write little programs to help me learn German.  See code
> after numbered comments. //Thanks in advance for any direction or
> suggestions.
>
> tk
>
> 2  Want keyboard answer input, for example:  
>
> answer_str  = raw_input(' Enter answer > ') Herr  Üü
>
> [ I keyboard in the following characters Herr Üü ]
> print answer_str
> Output on screen is > Herr Üü
>
> 3   history 1 and 2  code run interactively under Debian Linux Python
> 2.4 and interactively under windows98, first edition IDLE, Python 2.3.5
> and it works.
>
> 4  history 3 and 4 code run from within a .py file produce different
> output from example in book.
>
> 5 want to operate under Debian Linux but because the program failed
> under Linux when I tried to run the code from a file in Linux Python, I
> thougt I should fire up the win98 Idle/python program and try it to see
> if ran there but it failed, too from within a file.
>
> 6 The sample code is from page 108-109 of:   "Python for Dummies"
>       It says in the book:  "Python's file objects and StringIO objects
> don't support raw Unicode; the usual workaround is to encode Unicode as
> UTF-8 before saving it to a file or stringIO object.  
> The sample code from the book is French as indicate here but trying
> German produces the same result.
>
> 7 I have searched the net under all the keywords but this is as close as
> I get to accomplishing my task.  I suspect I may not be understanding:
> StringIO objects don't support raw Unicode, but I don't know.
>
> #_*_ coding: utf-8 _*_
>
> # code run under linux debian  interactively from a terminal and works
>
> print " u'Libert\u00e9' "
>
> # y = raw_input('Enter >')  commented out
>
> y = u'Lbert\u00e9'
> y.encode('utf-8')
> q = y.encode('utf-8')
> q.decode('utf-8')
> print q.decode('utf-8')
>
> history 1 works and here is the screen copy of interactive
>
>  >>> y = raw_input ('>')
>  >Libert\xc3\xa9
>  >>> q = 'Libert\xc3\xa9'
>  >>> q.decode('utf-8')
> u'Libert\xe9'
>  >>> print q
> Liberté
>  >>>
>
> [  screen output is next line ]
>
> Lberté
>
> history 2
> # code run under win98, first edition, within IDLE interactively and
> succeeded in produce correct results.
>
> # y = raw_input('Enter >')  commented out
>
> y = u'Lbert\u00e9'
> y.encode('utf-8')
> q = y.encode('utf-8')
> q.decode('utf-8')
> print q.decode('utf-8')
>
> history 1 works and here is the screen copy of interactive
>
>  >>> y = raw_input ('>')
>  >Libert\xc3\xa9
>  >>> q = 'Libert\xc3\xa9'
>  >>> q.decode('utf-8')
> u'Libert\xe9'
>  >>> print q
> Liberté
>  >>>
>
> [  screen output is next line ]
>
> Lberté
>
> # history 3
>
> # this code is run from within idle on win98 and inside a python file.  
> #  The code DOES NOT produce the proper outout.
>
> #_*_ coding: utf-8 _*_
>
> # print "u'Libert\u00e9'"  printed to screen
>
> y = raw_input('Enter >')
>
> # y = u'Lbert\u00e9' commented out
>
> y.encode('utf-8')
> q = y.encode('utf-8')
> q.decode('utf-8')
> print q.decode('utf-8')
>
> # output is  on the lines  below was produced on the screen after run
>
> enter u'Libert\u00e9' on screen to copy into into y string
> Enter >u'Libert\u00e9'
>
> u'Libert\u00e9'
>
> The code DOES NOT produce Liberté but instead produce u'Libert\u00e9'
>
> # history 4
>
> # this code is run from within terminal on Debian linux   inside a
> python file.  
> # The code does not produce proper outout but produces the same output
> as run on
> # windows.
>
> #_*_ coding: utf-8 _*_
>
> print "u'Libert\u00e9'"  printed to screen
>
> y = raw_input('Enter >')
>
> # y = u'Lbert\u00e9' commented out
>
> y.encode('utf-8')
> q = y.encode('utf-8')
> q.decode('utf-8')
> print q.decode('utf-8')
>
> # output is  on the lines  below was produced on the screen after run
>
> enter u'Libert\u00e9' on screen to copy into into y string
> Enter >u'Libert\u00e9'
> u'Libert\u00e9'
>
> The code DID NOT produce Liberté but instead produce u'Libert\u00e9'

raw_input returns what you entered. You entered u'Libert\u00e9' so
that's what was printed out.

If you want to be able to enter escape sequences like \u00e9 and have
them decoded to the appropriate character then you must do something
like this:

# The code
text = raw_input('Enter >')
decoded_text = text.decode("unicode-escape")
print decoded_text


# The output
Enter >Libert\u00e9
Liberté

HTH



More information about the Python-list mailing list