Deflate with urllib2...

Sam samslists at gmail.com
Thu Sep 18 22:29:30 EDT 2008


On Sep 18, 2:10 pm, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
wrote:
> En Tue, 16 Sep 2008 21:58:31 -0300, Sam <samsli... at gmail.com> escribió:
> The code is correct - try with another server. I tested it with a  
> LightHTTPd server and worked fine.

Gabriel...

I found a bunch of servers to test it on.  It fails on every server I
could find (sans one).

Here's the ones it fails on:
slashdot.org
hotmail.com
godaddy.com
linux.com
lighttpd.net

I did manage to find one webserver it succeeded on---that is
kenrockwel.com --- a domain squatter with a typoed domain of one of my
favorite photographer's websites (the actual website should be
kenrockwell.com)

This squatter's site is indeed running lighttpd---but it appears to be
an earlier version, because the official lighttpd site fails on this
test.

We have all the major web servers failing the test:
* Apache 1.3
* Apache 2.2
* Microsoft-IIS/6.0
* lighttpd/1.5.0

So I think it's the python side that is wrong, regardless of what the
standard is.

What should I do next?

I've rewritten the code to make it easier to test.  Just run it as is
and it will try all my test cases; or pass in a site on the command
line, and it will try just that.

Thanks!

#!/usr/bin/env python
"""Put the site you want to test as a command line parameter.
Otherwise tests the list of defaults."""

import urllib2
import zlib
import sys

opener = urllib2.build_opener()
opener.addheaders = [('Accept-encoding', 'deflate')]

try:
    sites = [sys.argv[1]]
except IndexError:
    sites = ['http://slashdot.org', 'http://www.hotmail.com',
             'http://www.godaddy.com', 'http://www.linux.com',
             'http://www.lighttpd.net', 'http://www.kenrockwel.com']

for site in sites:
    print "Trying: ", site
    stream = opener.open(site)
    data = stream.read()
    encoded = stream.headers.get('Content-Encoding')
    server = stream.headers.get('Server')

    print "  %s - %s (%s)" % (site, server, encoded)

    if encoded == 'deflate':
        before = len(data)
        try:
            data = zlib.decompress(data)
            after = len(data)
            print "  Able to decompress...went from %i to %i." %
(before, after)
        except zlib.error:
            print "  Errored out on this site."
    else:
        print "  Data is not deflated."
    print



More information about the Python-list mailing list