Deflate with urllib2... (solved)
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Fri Sep 19 11:25:09 EDT 2008
En Thu, 18 Sep 2008 23:29:30 -0300, Sam <samslists at gmail.com> escribió:
> On Sep 18, 2:10 pm, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
> wrote:
>> En Tue, 16 Sep 2008 21:58:31 -0300, Sam <samsli... at gmail.com> escribió:
>> The code is correct - try with another server. I tested it with a
>> LightHTTPd server and worked fine.
>
> Gabriel...
>
> I found a bunch of servers to test it on. It fails on every server I
> could find (sans one).
>
> Here's the ones it fails on:
> slashdot.org
> hotmail.com
> godaddy.com
> linux.com
> lighttpd.net
>
> I did manage to find one webserver it succeeded on---that is
> kenrockwel.com --- a domain squatter with a typoed domain of one of my
> favorite photographer's websites (the actual website should be
> kenrockwell.com)
>
> This squatter's site is indeed running lighttpd---but it appears to be
> an earlier version, because the official lighttpd site fails on this
> test.
>
> We have all the major web servers failing the test:
> * Apache 1.3
> * Apache 2.2
> * Microsoft-IIS/6.0
> * lighttpd/1.5.0
>
> So I think it's the python side that is wrong, regardless of what the
> standard is.
I've found the problem. The zlib header is missing (2 bytes), data begins
right with the compressed stream. You may decode it if you pass a negative
value for wsize:
try:
data = zlib.decompress(data)
except zlib.error:
data = zlib.decompress(data, -zlib.MAX_WBITS)
Note that this is clearly in violation of RFC 1950: the header is *not*
optional.
BTW, the curl developers had this same problem some time ago
<http://curl.haxx.se/mail/lib-2005-12/0130.html> and the proposed solution
is the same as above.
This is the output from your test script modified as above. (Note that in
some cases, the compressed stream is larger than the uncompressed data):
Trying: http://slashdot.org
http://slashdot.org - Apache/1.3.41 (Unix) mod_perl/1.31-rc4 (deflate)
len(def
late)=73174 len(gzip)=73208
Able to decompress...went from 73174 to 73073.
Trying: http://www.hotmail.com
http://www.hotmail.com - Microsoft-IIS/6.0 (deflate) len(deflate)=1609
len(gzi
p)=1635
Able to decompress...went from 1609 to 3969.
Trying: http://www.godaddy.com
http://www.godaddy.com - Microsoft-IIS/6.0 (deflate) len(deflate)=40646
len(gz
ip)=157141
Able to decompress...went from 40646 to 157141.
Trying: http://www.linux.com
http://www.linux.com - Apache/2.2.8 (Unix) PHP/5.2.5 (deflate)
len(deflate)=52
862 len(gzip)=52880
Able to decompress...went from 52862 to 52786.
Trying: http://www.lighttpd.net
http://www.lighttpd.net - lighttpd/1.5.0 (deflate) len(deflate)=5669
len(gzip)
=5687
Able to decompress...went from 5669 to 15746.
Trying: http://www.kenrockwel.com
http://www.kenrockwel.com - lighttpd (deflate) len(deflate)=414
len(gzip)=426
Able to decompress...went from 414 to 744.
--
Gabriel Genellina
More information about the Python-list
mailing list