eval and unicode

Thu Mar 20 10:31:43 EDT 2008

On Mar 20, 5:20 am, Laszlo Nagy <gand... at shopzeus.com> wrote:
> How can I specify encoding for the built-in eval function? Here is the
> documentation:
>
> http://docs.python.org/lib/built-in-funcs.html
>
> It tells that the "expression" parameter is a string. But tells nothing
> about the encoding. Same is true for: execfile, eval and compile.
>
> The basic problem:
>
> - expressions need to be evaluated by a program
> - expressions are managed through a web based interface. The browser
> supports UTF-8, the database also supports UTF-8. The user needs to be
> able to enter string expressions in different languages, and store them
> in the database
> - expressions are for filtering emails, and the emails can contain any
> character in any encoding
>
> I tried to use eval with/without unicode strings and it worked. Example:
>
>  >>> eval( u'"徹底したコスト削減 ÁÍŰŐÜÖÚÓÉ трирова"' ) == eval( '"徹底し
> たコスト削減 ÁÍŰŐÜÖÚÓÉ трирова"' )
> True
>
> The above test was made on Unbuntu Linux and gnome-terminal.
> gnome-terminal does support unicode. What would happen under Windows?
>
> I'm also confused how it is related to PEP 0263. I always get a warning
> when I try to enter '"徹底したコスト削減 ÁÍŰŐÜÖÚÓÉ трирова"' in a source
> file without "# -*- coding: " specified. Why is it not the same for
> eval? Why it is not raising an exception (or why the encoding does not
> need to be specified?)
>

Encoding information is only useful when you are converting between
bytes and unicode data. If you already have unicode data, you don't
need to do any more work to get unicode data.

Since a file can be in any encoding, it isn't apparent how to decode
the bytes seen in that file and turn them into unicode data. That's
why you need the # -*- coding magic to tell the python interpreter
that the bytes it will see in the file are encoded in a specific way.
Until we have a universal way to accurately find the encoding of every
file in an OS, we will need that magic. Who knows? Maybe one day there
will be a common file attribute system and one of the universal
attributes will be the encoding of the file. But for now, we are stuck
with ancient Unix and DOS conventions.

When you feed your unicode data into eval(), it doesn't have any
encoding or decoding work to do.