unicode to string conversion

Jeff Epler jepler at unpythonic.net
Thu May 8 15:29:30 EDT 2003


On Thu, May 08, 2003 at 01:24:33PM -0500, Skip Montanaro wrote:
> 
>     Luca> I would like to translate
> 
>     Luca> 	u'questa \xe8 bella'
>     Luca> into
>     Luca> 	'questa è bella'
> 
>     Luca> and put the result into a new variable
> 
> I love easy questions!
> 
>     >>> u = u'questa \xe8 bella'
>     >>> s = u.encode("iso-8859-1")
>     >>> print s
>     questa è bella

Huh, doesn't work here
>>> u = u'questa \xe8 bella'
>>> s = u.encode("iso-8859-1")
>>> print s
questa [] bella

where [] is a box-shaped character displayed for an invalid byte
sequence.

On my system, I must write
>>> print u.encode("utf")
questa è bella
to get the proper result

and on some windows system you would probably write
>>> print u.encode("cp850")
to do the deed.

It *may* be that the encoding returned by
    locale.getdefaultlocale()[1]
is the one that should be used (and it is on my system), or it may be
that the OP only needs the value to work on a single computer and can
determine the right encoding through educated guessing.

Jeff
PS looks like my mailer will re-encode this message as latin-1 when it
sends it, so who knows whether the u'\xe8' characters will continue to
display correctly..





More information about the Python-list mailing list