eval and unicode

Laszlo Nagy gandalf at shopzeus.com
Fri Mar 21 04:54:34 EDT 2008


 
  Hi Jonathan,


I think I made it too complicated and I did not concentrate on the 
question. I could write answers to your post, but I'm going to explain 
it formally:

 >>> s = '\xdb'  # This is a byte, without encoding specified.
 >>> s.decode('latin1')
u'\xdb' # The above byte decoded in latin1 encoding
 >>> s.decode('latin2')
u'\u0170' # The same byte decoded in latin2 encoding
 >>> expr = 'u"' + s + '"' # Create an expression for eval
 >>> expr
'u"\xdb"' # expr is not a unicode string - it is a binary string and it 
has no encoding assigned.
 >>> print repr(eval(expr)) # Eval it
u'\xdb'  # What? Why it was decoded as 'latin1'? Why not 'latin2'? Why 
not 'ascii'?
 >>> eval( "# -*- coding: latin2 -*-\n" + expr)
u'\u0170' # You can specify the encoding for eval, that is cool.

I hope it is clear now.  Inside eval, an unicode object was created from 
a binary string. I just discovered that PEP 0263 can be used to specify 
source encoding for eval. But still there is a problem: eval should not 
assume that the expression is in any particular encoding. When it sees 
something like '\xdb' then it should raise a SyntaxError - same error 
that you should get when running a .py file containing the same expression:

 >>> file('test.py','wb+').write(expr + "\n")
 >>> ^D
gandalf at saturnus:~$ python test.py
  File "test.py", line 1
SyntaxError: Non-ASCII character '\xdb' in file test.py on line 1, but 
no encoding declared; see http://www.python.org/peps/pep-0263.html for 
details

Otherwise the interpretation of the expression will be ambiguous. If 
there is any good reason why eval assumed a particular encoding in the 
above example?

Sorry for my misunderstanding - my English is not perfect. I hope it is 
clear now.

My problem is solved anyway. Anytime I need to eval an expression, I'm 
going to specify the encoding manually with # -*- coding: XXX -*-. It is 
good to know that it works for eval and its counterparts. And it is 
unambiguous.  :-)

Best,

   Laszlo




More information about the Python-list mailing list