unicode string problems

Brian Quinlan brian at sweetapp.com
Mon Apr 1 17:57:19 EST 2002


Gonçalo Rodrigues wrote:

> f.write("Março 2002" + march.Name())
> 
> where march.Name() returns a unicode string I get a unicode error. I
> tried converting both unicodes to strings via str but obviously I got
an
> error (in the first string the culprit is the "ç" character).
> 
> Can someone help me out here and show me the way to write these
strings
> to the file?

Maybe there should be a FAQ for this...

The problem that you are having is due to the fact that there are many
possible string encodings for the same Unicode string (and vise-versa).
For example:

>>> u'ç'.encode('iso-8859-1')
'\x87'
>>> u'ç'.encode('utf-8')
'\xc2\x87'
>>> u'ç'.encode('utf-16-le')
'\x87\x00'

So, when you add a string and Unicode object together, Python attempts
to convert the string into a Unicode object. But it refuses to guess
what encoding you mean and rejects all non-ASCII characters.

Here is a simple solution:

f.write("Março 2002" + march.Name().encode('latin-1'))

This will convert the Unicode name into a string object using the
Latin-1 encoding.

Cheers,
Brian





More information about the Python-list mailing list