[Tutor] trying to convert pycurl/html to ascii

bruce badouglas at gmail.com
Mon Mar 30 03:49:23 CEST 2015


Hi.

Doing a quick/basic pycurl test on a site and trying to convert the
returned page to pure ascii.

The page has the encoding line

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

The test uses pycurl, and the StringIO to fetch the page into a str.

pycurl stuff
.
.
.
foo=gg.getBuffer()

-at this point, foo has the page in a str buffer.


What's happening, is that the test is getting the following kind of error/

UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 20:
invalid start byte

The test is using python 2.6 on redhat.

I've tried different decode functions based on different
sites/articles/stackoverflow but can't quite seem to resolve the issue.

Any thoughts/pointers would be useful!

Thanks


More information about the Tutor mailing list