[Tutor] i18n Encoding/Decoding issues

Michael Lange klappnase at freenet.de
Thu Aug 10 21:27:30 CEST 2006


Hi Jorge,


On Thu, 10 Aug 2006 13:32:10 +0100
"Jorge De Castro" <jorge.castro at msn.com> wrote:

(...)
> 
> Using unicode(body, 'latin-1').encode('utf-8') doesn't work either. Besides, 
> am I the only one to feel that if I want to encode something in UTF-8 it 
> doesn't feel intuitive to have to convert to latin-1 first and then encode?
> 

if the above does not work, it is because the original message is not
latin-1 encoded. unicode(body, 'latin-1') does not convert *to* latin-1, but
convert a latin-1 encoded string into unicode. This will obviously only work as
expected if the original string is actually latin-1.
In order to safely convert the message body into utf-8 you would have to find out
which encoding is used for the message and then do
    unicode(body, original_encoding).encode('utf-8')

Michael


More information about the Tutor mailing list