Encode exception for chinese text

Serge Orlov Serge.Orlov at gmail.com
Fri May 19 07:58:49 EDT 2006


Vinayakc wrote:
> Hi all,
>
> I am new to python.
>
> I have written one small application which reads data from xml file and
> tries to encode data using apprpriate charset.
> I am facing problem while encoding one chinese paragraph with charset
> "gb2312".
>
> code is:
>
> encoded_str = str_data.encode("gb2312")
>
> The type of str_data is <type 'unicode'>
>
> The exception is:
>
> "UnicodeEncodeError: 'gb2312' codec can't encode character u'\xa0' in
> position 0: illegal multibyte sequence"

Hmm, this is 'no-break space' in the very beginning of the text. It
look suspiciously like a  plain text utf-8 signature which is 'zero
width no-break space'. If you strip the first character do you still
have encoding errors?




More information about the Python-list mailing list