eval and unicode
Laszlo Nagy
gandalf at shopzeus.com
Thu Mar 20 17:20:46 EDT 2008
>> I tried to use eval with/without unicode strings and it worked. Example:
>>
>> >>> eval( u'"徹底したコスト削減 ÁÍŰŐÜÖÚÓÉ трирова"' ) == eval( '"徹底し
>> たコスト削減 ÁÍŰŐÜÖÚÓÉ трирова"' )
>> True
>>
> When you feed your unicode data into eval(), it doesn't have any
> encoding or decoding work to do.
>
Yes, but what about
eval( 'u' + '"徹底したコスト削減 ÁÍŰŐÜÖÚÓÉ трирова"' )
The passed expression is not unicode. It is a "normal" string. A
sequence of bytes. It will be evaluated by eval, and eval should know
how to decode the byte sequence. Same way as the interpreter need to
know the encoding of the file when it sees the u"徹底したコスト削減
ÁÍŰŐÜÖÚÓÉ трирова" byte sequence in a python source file - before
creating the unicode instance, it needs to be decoded (or not, depending
on the encoding of the source).
String passed to eval IS python source, and it SHOULD have an encoding
specified (well, unless it is already a unicode string, in that case
this magic is not needed).
Consider this:
exec("""
import codecs
s = u'Ű'
codecs.open("test.txt","w+",encoding="UTF8").write(s)
""")
Facts:
- source passed to exec is a normal string, not unicode
- the variable "s", created inside the exec() call will be a unicode
string. However, it may be Û or something else, depending on the
source encoding. E.g. ASCII encoding it is invalid and exec() should
raise a SyntaxError like:
SyntaxError: Non-ASCII character '\xc5' in file c:\temp\aaa\test.py on
line 1, but no encoding declared; see
http://www.python.org/peps/pep-0263.html for details
Well at least this is what I think. If I'm not right then please explain
why.
Thanks
Laszlo
More information about the Python-list
mailing list