eval and unicode

Fri Mar 21 10:16:14 EDT 2008

On Mar 21, 1:54 am, Laszlo Nagy <gand... at shopzeus.com> wrote:
>  >>> eval( "# -*- coding: latin2 -*-\n" + expr)
> u'\u0170' # You can specify the encoding for eval, that is cool.
>

I didn't think of that. That's pretty cool.

> I hope it is clear now.  Inside eval, an unicode object was created from
> a binary string. I just discovered that PEP 0263 can be used to specify
> source encoding for eval. But still there is a problem: eval should not
> assume that the expression is in any particular encoding. When it sees
> something like '\xdb' then it should raise a SyntaxError - same error
> that you should get when running a .py file containing the same expression:
>
>  >>> file('test.py','wb+').write(expr + "\n")
>  >>> ^D
> gandalf at saturnus:~$ python test.py
>   File "test.py", line 1
> SyntaxError: Non-ASCII character '\xdb' in file test.py on line 1, but
> no encoding declared; seehttp://www.python.org/peps/pep-0263.htmlfor
> details
>
> Otherwise the interpretation of the expression will be ambiguous. If
> there is any good reason why eval assumed a particular encoding in the
> above example?
>

I'm not sure, but being in a terminal session means a lot can be
inferred about what encoding a stream of bytes is in. I don't know off
the top of my head where this would be stored or how Python tries to
figure it out.

>
> My problem is solved anyway. Anytime I need to eval an expression, I'm
> going to specify the encoding manually with # -*- coding: XXX -*-. It is
> good to know that it works for eval and its counterparts. And it is
> unambiguous.  :-)
>

I would personally adopt the Py3k convention and work with text as
unicode and bytes as byte strings. That is, you should pass in a
unicode string every time to eval, and never a byte string.