Python doc problems example: gzip module

Peter Wang misterwang at gmail.com
Thu Sep 1 10:24:26 EDT 2005


>> Constructor for the GzipFile class, which simulates most of the methods
>> of a file object, with the exception of the readinto() and truncate()
>
> yeah, blab blab blab. what the fuck are you talking about? So, how to
> use it?

um... presumably you type "zippedfile = GzipFile(...)" and depending on
whether you are creating a new file, or extracting an existing
GzipFile.  the documentation says:

> The new class instance is based on fileobj, which can be a regular file, a
> StringIO object, or any other object which simulates a file. It defaults to
> None, in which case filename is opened to provide a file object."

so i guess in your case you would want to do "zippedfile =
GzipFile("myfile.gz")".

>> When fileobj is not None, the filename argument is only used to be
>> included in the gzip file header, which may includes the original
>> filename of the uncompressed file. It defaults to the filename of
>> fileobj, if discernible; otherwise, it defaults to the empty string,
>> and in this case the original filename is not included in the header.
>
> what the fuck??

when you "gzip -d myfile.gz", the resultant output name might not be
"myfile".  The uncompressed name can be stored in the gzip header, and
so if you provide both a fileobj argument and a filename argument to
the GzipFile constructor, it will use fileobj for the data stream and
just place filename into the header (as opposed to opening the file
"filename").

>> The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', or 'wb',
>> depending on whether the file will be read or written. The default is
>> the mode of fileobj if discernible; otherwise, the default is 'rb'. If
>> not given, the 'b' flag will be added to the mode to ensure the file is
>> opened in binary mode for cross-platform portability.
>
> discernible? so, what the fuck are exactly these modes? can't you
> describe them concretely?

these are the same modes that are used in just about every single
programming language when it comes to opening files.  these modes are
described in the Python tutorial and in the core Python documentation
about files and file I/O.  It should not be surprising, therefore, that
GzipFile, which "simulates most of the methods of a file object",
should have the same semantics when it comes to file modes.

it is actually quite shocking to me that someone with 10 years of
computing experience would not know what "rb" and "rb" mean in the
context of opening files in a programming language.

>> Calling a GzipFile object's close() method does not close fileobj,
>> since you might wish to append more material after the compressed data.
>> This also allows you to pass a StringIO object opened for writing as
>> fileobj, and retrieve the resulting memory buffer using the StringIO
>> object's getvalue() method.
>
> huh? append more material? pass a StringIO? and memory buffer?

you see, not everyone who uses GzipFile will be decompressing files.
sometimes they will be *compressing* file data.  in this case, it's
very possible that they want to compress data going over a network
stream, or embed some gzipped into the middle of their own file format.
 GzipFile doesn't make any assumptions about what the user is going to
do with the gzipped data or the file object that the Gzip module is
writing into/reading from.

> Motherfucking 90% of programers using this module really just want to
> compress or decompress a file.

I disagree.  I think a whopping (non-motherfucking) 100% of programmers
using this module want to compress or decompress file data.  If someone
just wants to decompress a file, wouldn't they just do:

import os
os.system("gzip -d filename.gz")

The GzipFile module is meant to be used by folks who want to gzip or
gunzip file data in a programmatic function.  It's not meant to be a
drop-in, shell-scripting replacement for the gzip command.

-peter




More information about the Python-list mailing list