eval and unicode

Laszlo Nagy gandalf at shopzeus.com
Thu Mar 20 17:20:46 EDT 2008


>> I tried to use eval with/without unicode strings and it worked. Example:
>>
>>  >>> eval( u'"徹底したコスト削減 ÁÍŰŐÜÖÚÓÉ трирова"' ) == eval( '"徹底し
>> たコスト削減 ÁÍŰŐÜÖÚÓÉ трирова"' )
>> True
>>     
> When you feed your unicode data into eval(), it doesn't have any
> encoding or decoding work to do.
>   

Yes, but what about

eval( 'u' + '"徹底したコスト削減 ÁÍŰŐÜÖÚÓÉ трирова"' )

The passed expression is not unicode. It is a "normal" string. A 
sequence of bytes. It will be evaluated by eval, and eval should know 
how to decode the byte sequence. Same way as the interpreter need to 
know the encoding of the file when it sees the u"徹底したコスト削減 
ÁÍŰŐÜÖÚÓÉ трирова" byte sequence in a python source file - before 
creating the unicode instance, it needs to be decoded (or not, depending 
on the encoding of the source).

String passed to eval IS python source, and it SHOULD have an encoding 
specified (well, unless it is already a unicode string, in that case 
this magic is not needed).

Consider this:

exec("""
import codecs
s = u'Ű'
codecs.open("test.txt","w+",encoding="UTF8").write(s)
""")

Facts:

- source passed to exec is a normal string, not unicode
- the variable "s", created inside the exec() call will be a unicode 
string. However, it may be Û or something else, depending on the 
source encoding. E.g. ASCII encoding it is invalid and exec() should 
raise a SyntaxError like:

SyntaxError: Non-ASCII character '\xc5' in file c:\temp\aaa\test.py on 
line 1, but no encoding declared; see 
http://www.python.org/peps/pep-0263.html for details

Well at least this is what I think. If I'm not right then please explain 
why.

Thanks

Laszlo




More information about the Python-list mailing list