unicode mystery/problem

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Fri Sep 22 09:45:00 EDT 2006


In <mailman.453.1158927918.10491.python-list at python.org>, Petr Jakeš
wrote:

> I have try to experiment with the code a bit.
> the simplest code where I can demonstrate my problems:
> #!/usr/bin python
> import sys
> print "default", sys.getdefaultencoding()
> print "stdout", sys.stdout.encoding
>    
> a=['P\xc5\x99\xc3\xad','Petr Jake\xc5\xa1']
> b="my nice try %s" % ''.join(a).encode("utf-8")

You have two byte strings in the list `a` and try to *encode* them as
utf-8.  That does not work.  You can make the example even a bit simpler::

 'P\xc5\x99\xc3\xadPetr Jake\xc5\xa1'.encode('utf-8')

You cant't *encode* byte strings, just *decode* them.  What happens is
that Python tries to make a unicode string from the byte string to encode
that in utf-8.  But it decodes as ASCII as that is the default.

Don't mix byte strings and unicode strings.  Put an encoding declaration
at the top of your file and convert everything to unicode on the "way in"
and to the proper encoding on the "way out" of your program.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list