Encode exception for chinese text

Serge Orlov Serge.Orlov at gmail.com
Fri May 19 09:02:14 EDT 2006


Vinayakc wrote:
> Yes serge, I have removed the first character but it is still giving
> encoding exception.

Then I guess this character was used as a poor man indentation tool at
least in the beginning of your text. It's up to you to decide what to
do with that character, you have several choices:

* edit source xml file to get rid of it
* remove it while you process your data
* replace it with ordinary space
* consider utf-8

Note, there are legitimate use cases for no-break space, for example
one million can be written like 1 000 000, where spaces are
non-breakable. This prevents the number to be broken by right margin
like this: 1 000
000

Keep that in mind when you remove or replace no-break space.




More information about the Python-list mailing list