Converting html character codes to utf-8 text

Peter Otten __peter__ at web.de
Tue Jun 19 07:14:36 EDT 2012


Johann Spies wrote:

> I am trying the following:
> 
> Change data like this:
> 
> Bien Donné : agri tourism
> 
> to this:
> 
> Bien Donné agri tourism
> 
> I am using the 'unescape' function published on
> http://effbot.org/zone/re-sub.htm#unescape-html but working through a file
> I get the following error:
> 
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 519:
> ordinal not in range(128)
> 
> and I do not now how to solve this problem.
> 
> Any solution will be very appriciated.

The information you give is not sufficient to give a fix, but my crystal 
ball says that the string you pass to unescape() contains an e with acute 
encoded in utf-8 and not as an html escape. Instead of 

unescape(mydata)

try

unescape(mydata.decode("utf-8"))

If that doesn't fix the problem come back with a self-contained example.






More information about the Python-list mailing list