Cannot able to retreive compressed html URL

rushik rushik.upadhyay at gmail.com
Sun Feb 15 14:56:55 EST 2009


Hi,
I am trying to build python script which retreives and analyze the
various URLs and generate reports.

Some of the urls are like "http://xyz.com/test.html.gz", I am trying
to retreive it using urllib2 library and then using gzip library
trying to decompress it.

ex - server_url is say - http://xyz.com/test.html.gz

                logpage = urllib2.urlopen(server_url)
                html_content = cal_logpage.read()
                logpage.close()

                gz_tmp = open("gzip.txt.gz", "w")
                gz_tmp.write(html_content)
                gz_tmp.close()
                f = gzip.open("gzip.txt.gz", "rb")
                file_content = f.read()
                f.close()

                #return the resulting html content.
                return html_content

on executing the code, its giving

zlib.error - Error -3 while decompressing: invalid distance too far
back

the same URL I am able to retreive in proper html page format from
browser

please let me know if I am doing something wrong here, or is there any
other better way to do so.

Thanks,
R



More information about the Python-list mailing list