Why do my list go uni-code by itself?

Ian Kelly ian.g.kelly at gmail.com
Mon Dec 20 16:41:36 EST 2010


On Mon, Dec 20, 2010 at 2:08 PM, Martin Hvidberg <Martin at hvidberg.net> wrote:
> Question:
> In the last printout, tagged >InReturLst> all entries turn into uni-code.
> What happens here?

Actually, they were all unicode to begin with.  You're using
codecs.open to read the file, which transparently decodes the data
using the supplied encoding (in this case, utf-8).  If you wanted to
preserve the original bytes, you would just use the open() function to
open the file instead.

> Look for the word 'FANØ'. This word changes from 'FANØ' to u'FAN\xd8' –
> That's a problem to me, and I don't want it to change like this.

This happens because you're printing a list instead of a unicode
string.  When you print the unicode string, it tries to print the
actual characters.  When you print the list, it constructs the repr of
the list, which uses the repr of each of the items in the list, and
the repr of the unicode string is u'FAN\xd8'.  If you don't want this
to happen, then you will need to format the list as a string yourself
instead of relying on print to do what it thinks you might want.

Cheers,
Ian



More information about the Python-list mailing list