Python doc problem example: gzip module (reprise)

Gerard Flanagan grflanagan at yahoo.co.uk
Sat Nov 5 07:15:36 EST 2005


Xah Lee wrote:
> Python Doc Problem Example: gzip
>
> Xah Lee, 20050831
>
> Today i need to use Python to compress/decompress gzip files. Since
> i've read the official Python tutorial 8 months ago, have spent 30
> minutes with Python 3 times a week since, have 14 years of computing
> experience, 8 years in mathematical computing and 4 years in unix admin
> and perl, i have quickly found the official doc:
> http://python.org/doc/2.4.1/lib/module-gzip.html
>
> I'd imagine it being a function something like:
>
> fileContent = GzipFile(filePath, comprress/decompress)
>
> However, scanning the doc after 20 seconds there's no single example
> showing how it is used.
>
> Instead, the doc starts with some arcane info about compatibility with
> some other compression module and other software. Then it talks in a
> very haphazard way with confused writing about the main function
> GzipFile. No perspectives whatsoever about using it to solve a problem
> nor a concrete description of how to use it. Instead, jargons of Class,
> Constructor, Object etc are thrown together with presumption of
> reader's expertise of IO programing in Python and gzip compression
> arcana.
>
> After no understanding, and being not a Python expert, i wanted to read
> about file objects but there's no link.
>
> After locating the file object's doc page:
> http://python.org/doc/2.4.1/lib/bltin-file-objects.html, but itself is
> written and organized in a very unhelpful way.
>
> Here's the detail of the problems of its documentation. It starts with:
>
>     «The data compression provided by the zlib module is compatible
> with that used by the GNU compression program gzip. Accordingly, the
> gzip module provides the GzipFile class to read and write gzip-format
> files, automatically compressing or decompressing the data so it looks
> like an ordinary file object. Note that additional file formats which
> can be decompressed by the gzip and gunzip programs, such as those
> produced by compress and pack, are not supported by this module.»
>
> This intro paragraph is about 3 things: (1) the purpose of this gzip
> module. (2) its relation with zlib module. (3) A gratuitous arcana
> about gzip program's support of “compress and pack” software being
> not supported by Python's gzip module. Necessarily mentioned because
> how the writing in this paragraph is phrased. The writing itself is a
> jumble.
>
> Of the people using the gzip module, vast majority really just need to
> decompress a gzip file. They don't need to know (2) and (3) in a
> preamble. The worst aspect here is the jumbled writing.
>
>     «class GzipFile( [filename[, mode[, compresslevel[, fileobj]]]])
> Constructor for the GzipFile class, which simulates most of the methods
> of a file object, with the exception of the readinto() and truncate()
> methods. At least one of fileobj and filename must be given a
> non-trivial value. The new class instance is based on fileobj, which
> can be a regular file, a StringIO object, or any other object which
> simulates a file. It defaults to None, in which case filename is opened
> to provide a file object.»
>
> This paragraph assumes that readers are thoroughly familiar with
> Python's File Objects and its methods. The writing is haphazard and
> extremely confusive. Instead of explicitness and clarity, it tries to
> convey its meanings by side effects.
>
> • The words “simulate” are usd twice inanely. The sentence
> “...Gzipfile class, which simulates...” is better said by
> “Gzipfile is modeled after Python's File Objects class.”
>
> • The intention to state that it has all Python's File Object methods
> except two of them, is ambiguous phrased. It is as if to say all
> methods exists, except that two of them works differently.
>
> • The used of the word “non-trivial value” is inane. What does a
> non-trivial value mean here? Does “non-trivial value” have specific
> meaning in Python? Or, is it meant with generic English interpretation?
> If the latter, then what does it mean to say: “At least one of
> fileobj and filename must be given a non-trivial value”? Does it
> simply mean one of these parameters must be given?
>
> • The rest of the paragraph is just incomprehensible.
>
>     «When fileobj is not None, the filename argument is only used to
> be included in the gzip file header, which may includes the original
> filename of the uncompressed file. It defaults to the filename of
> fileobj, if discernible; otherwise, it defaults to the empty string,
> and in this case the original filename is not included in the header.»
>
> “discernible”? This writing is very confused, and it assumes the
> reader is familiar with the technical specification of Gzip.
>
>     «The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', or
> 'wb', depending on whether the file will be read or written. The
> default is the mode of fileobj if discernible; otherwise, the default
> is 'rb'. If not given, the 'b' flag will be added to the mode to ensure
> the file is opened in binary mode for cross-platform portability.»
>
> “discernible”? Again, familiarity with the working of Python's file
> object is implicitly assumed. For people who do not have expertise with
> working with files using Python, it necessatates the reading of
> Python's file objects documentation.
>
>     «The compresslevel argument is an integer from 1 to 9 controlling
> the level of compression; 1 is fastest and produces the least
> compression, and 9 is slowest and produces the most compression. The
> default is 9.»
>
>     «Calling a GzipFile object's close() method does not close
> fileobj, since you might wish to append more material after the
> compressed data. This also allows you to pass a StringIO object opened
> for writing as fileobj, and retrieve the resulting memory buffer using
> the StringIO object's getvalue() method.»
>
> huh? append more material? pass a StringIO? and memory buffer?
>
> Here, expertise in programing with IO is assumed of the reader.
> Meanwhile, the writing is not clear about how exactly what it is trying
> to say about the close() method.
> Suggestions
> --------------------------
> A quality documentation should be clear, succinct, precise. And, the
> least it assumes reader's expertise to obtain these qualities, the
> better it is.
>
> Vast majority of programers using this module really just want to
> compress or decompress a file. They do not need to know any more
> details about the technicalities of this module nor about the Gzip
> compression specification. Here's what Python documentation writers
> should do to improve it:
>
> • Rewrite the intro paragraph. Example: “This module provides a
> simple interface to compress and decompress files using the GNU
> compression format gzip. For detailed working with gzip format, use the
> zlib module.”. The “zlib module” phrase should be linked to its
> documentation.
>
> • Near the top of the documentation, add a example of usage. A
> example is worth a thousand words:
>
>  # decompressing a file
> import gzip
> fileObj = gzip.GzipFile("/Users/joe/war_and_peace.txt.gz", 'rb');
> fileContent = fileObj.read()
> fileObj.close()
>
>  # compressing a file
> import gzip
> fileObj = gzip.GzipFile("/Users/mary/hamlet.txt.gz", 'wb');
> fileObj.write(fileContent)
> fileObj.close()
>
> • Add at the beginning of the documentation a explicit statement,
> that GzipFile() is modeled after Python's File Objects, and provide a
> link to it.
>
> • Rephrase the writing so as to not assume that the reader is
> thoroughly familiar with Python's IO. For example, when speaking of the
> modes 'r', 'rb', ... add a brief statement on what they mean. This way,
> readers may not have to take a extra step to read the page on File
> Objects.
>
> • Remove arcane technical details about gzip compression to the
> bottom as footnotes.
>
> • General advice on the writing: The goal of writing on this module
> is to document its behavior, and effectively indicate how to use it.
> Keep this in mind when writing the documentation. Make it clear on what
> you are trying to say for each itemized paragraph. Make it precise, but
> without over doing it. Assume your readers are familiar with Python
> language or gzip compression. For example, what are classes and objects
> in Python, and what compressions are, compression levels, file name
> suffix convention. However, do not assume that the readers are expert
> of Python IO, or gzip specification or compression technology and
> software in the industry. If exact technical details or warnings are
> necessary, move them to footnotes.
> ---------------
>
>  Xah
>  xah at xahlee.org
>http://xahlee.org/

"""
You want to create the world before which you can kneel: this is your
ultimate hope and intoxication.

Also sprach Zarathustra.

"""

--Friedrich Nietzsche, Thus Spoke Zarathustra, 1885


Gerard




More information about the Python-list mailing list