UnicodeDecodeError having fetch web page

Peter Otten __peter__ at web.de
Tue May 25 16:10:38 EDT 2010


Barry wrote:

> On 25 Maj, 21:39, Philip Semanchuk <phi... at semanchuk.com> wrote:
>> On May 25, 2010, at 3:13 PM, Barry wrote:
>>
>>
>>
>> > Hi,
>>
>> > The code below is giving me the error:
>>
>> > Traceback (most recent call last):
>> > File "C:\Users\Administratör\Desktop\test.py", line 4, in <module>
>> > UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1:
>> > unexpected code byte
>>
>> > What am i doing wrong?
>>
>> > Thanks,
>>
>> > Barry
>>
>> > request = urllib.request.Request(url='http://en.wiktionary.org/wiki/
>> > baby',headers={'User-Agent':'Mozilla/5.0 (X11; U; Linux i686) Gecko/
>> > 20071127 Firefox/2.0.0.11'} )
>>
>> > response = urllib.request.urlopen(request)
>> > html = response.read().decode('utf-8')
>>
>> Well, for starters you're assuming that the response content is in
>> UTF-8. You need to examine the Content-Type header to see what the
>> encoding is. If it's not UTF-8, there's your problem.
>>
>> HTH
>> P
> 
> The content type is utf-8:
> 
> Date: Wed, 19 May 2010 19:17:39 GMT
> Server: Apache
> Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
> Content-Language: en
> Vary: Accept-Encoding,Cookie
> Last-Modified: Wed, 19 May 2010 10:10:34 GMT
> Content-Encoding: gzip

But the data is gzipped. You have to uncompress it before decoding.

Peter



More information about the Python-list mailing list