eval and unicode
Jonathan Gardner
jgardner at jonathangardner.net
Fri Mar 21 10:16:14 EDT 2008
On Mar 21, 1:54 am, Laszlo Nagy <gand... at shopzeus.com> wrote:
> >>> eval( "# -*- coding: latin2 -*-\n" + expr)
> u'\u0170' # You can specify the encoding for eval, that is cool.
>
I didn't think of that. That's pretty cool.
> I hope it is clear now. Inside eval, an unicode object was created from
> a binary string. I just discovered that PEP 0263 can be used to specify
> source encoding for eval. But still there is a problem: eval should not
> assume that the expression is in any particular encoding. When it sees
> something like '\xdb' then it should raise a SyntaxError - same error
> that you should get when running a .py file containing the same expression:
>
> >>> file('test.py','wb+').write(expr + "\n")
> >>> ^D
> gandalf at saturnus:~$ python test.py
> File "test.py", line 1
> SyntaxError: Non-ASCII character '\xdb' in file test.py on line 1, but
> no encoding declared; seehttp://www.python.org/peps/pep-0263.htmlfor
> details
>
> Otherwise the interpretation of the expression will be ambiguous. If
> there is any good reason why eval assumed a particular encoding in the
> above example?
>
I'm not sure, but being in a terminal session means a lot can be
inferred about what encoding a stream of bytes is in. I don't know off
the top of my head where this would be stored or how Python tries to
figure it out.
>
> My problem is solved anyway. Anytime I need to eval an expression, I'm
> going to specify the encoding manually with # -*- coding: XXX -*-. It is
> good to know that it works for eval and its counterparts. And it is
> unambiguous. :-)
>
I would personally adopt the Py3k convention and work with text as
unicode and bytes as byte strings. That is, you should pass in a
unicode string every time to eval, and never a byte string.
More information about the Python-list
mailing list