Python list archives double-gzipped?

Mon Aug 27 09:52:12 EDT 2012

On 27.08.2012 03:40, Tim Chase wrote:
> So it looks like some python-list@ archiving process is double
> gzip'ing the archives.  Can anybody else confirm this and get the
> info the right people?

In January, "random joe" noticed the same problem[1].
I think, Anssi Saari[2] was right in saying that there is something 
wrong in the browser or server setup, because I notice the same 
behaviour with Firefox, Chromium, wget and curl.

$ ll *July*
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 chromium_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug 27 13:41 curl_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 firefox_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug  2 03:27 wget_2012-July.txt.gz

The browsers get a double gzipped file (size 747850) whereas the 
download utilities get a normal gzipped file (size 748041).

After looking at the HTTP request and response headers I've noticed that 
the browsers accept compressed data ("Accept-Encoding: gzip, deflate") 
whereas wget/curl by default don't. After adding that header to 
wget/curl they get the same double gzipped file as the browsers do:

$ ll *July*
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 chromium_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug 27 13:41 curl_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:40 
curl_encoding_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 firefox_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug  2 03:27 wget_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug  2 03:27 
wget_encoding_2012-July.txt.gz

I think the following is happening:
If you send the "Accept-Encoding: gzip, deflate"-header, the server will 
gzip the file a second time (which is arguably unnecessary) and responds 
with "Content-Encoding: gzip" and "Content-Type: application/x-gzip" 
(which is IMHO correct according to RFC2616/14.11 and 14.17[3]).
But because many servers apparently don't set correct headers, the 
default behaviour of most browsers nowadays is to ignore the 
content-encoding for gzip files (application/x-gzip - see bug report for 
firefox[4] and chromium[5]) and don't uncompress the outer layer, 
leading to a double gzipped file in this case.

Bye, Andreas

[1] http://mail.python.org/pipermail/python-list/2012-January/617983.html

[2] http://mail.python.org/pipermail/python-list/2012-January/618211.html

[3] http://www.ietf.org/rfc/rfc2616

[4] https://bugzilla.mozilla.org/show_bug.cgi?id=610679#c5

[5] http://code.google.com/p/chromium/issues/detail?id=47951#c9