Internationalised email subjects

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Thu Jun 21 06:27:56 EDT 2007


En Thu, 21 Jun 2007 06:23:43 -0300, <bugmagnet at gmail.com> escribió:

> That's really strange.  The chinese characters I am inputing into the
> post are not being displayed.  Basically, what I am doing is this:
>
> h = Header('(Some Chinese characters inserted here', 'GB2312')
>
> And when I run this code, I receive the following error message:
>
> UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 2-3:
> illegal multibyte sequence

If you execute: print "some chinese characters", do you get the right  
results?
Are you sure your system is using gb2312? In case you don't know and don't  
trust autodetection, try something like this:

py> from unicodedata import *
py> name("á".decode("latin-1"))
'NO-BREAK SPACE'
py> name("á".decode("cp850"))
'LATIN SMALL LETTER A WITH ACUTE'

The first attempt shows the wrong name, so my console *cannot* be using  
latin-1. With cp850 I got the right results, so it *might* be cp850 (it  
may also be another encoding that happens to match this single character).  
Further tests may reveal that it is actually cp850.
You should try with "some chinese characters" and see if your encoding is  
actually gb2312.

-- 
Gabriel Genellina




More information about the Python-list mailing list