[BangPypers] UnicodeDecodeError: 'utf8' codec can't decode byte xxx

JAGANADH G jaganadhg at gmail.com
Sun Apr 17 16:41:47 CEST 2011


On Sun, Apr 17, 2011 at 8:01 PM, Nikunj Badjatya
<nikunjbadjatya at gmail.com>wrote:

> Hi All,
>
> I am working on a self project for grabbing certain URL's from the web. Do
> some processing and store the final contents in text/pdf file.
>
> I am also using html2text (
> https://github.com/aaronsw/html2text/archives/master ) for converting the
> fetched page into text format.
> As a first step I tried with fetching and converting to text using
> following
> code.
>
> Code :
> {{{
> #!/bin/python
>
> import os
> import urllib
>
> fetch = urllib.urlopen("some-web-link.htm")
>
> mainfile = open ('main.html', 'w' )
>
> mainfile.write(fetch.read())
>
> os.system('python2.6 html2text.py main.html > main.txt')
>
> }}}
>
> It flags an error:
> {{{
> Traceback (most recent call last):
>  File "html2text.py", line 447, in <module>
>    data = open(arg, 'r').read().decode(encoding)
>  File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
>    return codecs.utf_8_decode(input, errors, True)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x88 in position 11366:
> invalid start byte
>
> }}}
>
> I also tried with
> {{{
> + import codecs
>
> ...
> ...
> - mainfile = open ('main.html', 'w' )
> +mainfile = codecs.open('xyz.htm', 'w', None, 'ignore')
>
> ...
> ...
> }}}
>
> Result is coming the same.
>
> Please tell as to what can be done to avoid this error.?
>
>


Try this

from django.utils.encoding import smart_str

myunistr = smart_str(YOUR_STRING)

This will solve the issue



-- 
**********************************
JAGANADH G
http://jaganadhg.freeflux.net/blog
*ILUGCBE*
http://ilugcbe.techstud.org


More information about the BangPypers mailing list