zlibmodule threadsafety changes: RFC

Titus Brown t at chabry.caltech.edu
Fri Aug 17 03:05:44 EDT 2001


Hi all,

during the development of a Web site that serves compressed XML data,
I found that the zlib module did not allow thread swapping during long
compresses:  that is, while Python was compressing a large string, all
other Python execution would halt (within PyWX, a threaded embedding of
Python). Upon further investigation, I discovered that:

* the zlib itself (not the Python interface) IS threadsafe;

* the zlibmodule (the Python interface) is NOT threadsafe for two reasons:

	1) there's no lock to prevent one thread from modifying the
	   internals of de/compression objects while another thread is
	   working on it;

	2) the input strings to the de/compress functions can be modified
	   outside of the zlibmodule while the zlibmodule functions are
	   working on them.

I gamely went ahead and I have some fixes for these problems.  However,
there are some tough choices to be made (hence this request for comments ;).

First off, my fix for the second problem involves making an internal copy of
all input strings before allowing thread swapping to occur.  Is this likely
to cause unacceptable memory usage?  If so, should I add a fair bit more
complexity to the code by putting in #ifdefs to handle both threaded cases
(sequestering of input) and unthreaded installs (no sequestering of input)?

Second of all, my fix to the first problem involves a *global* zlib lock.
Now, if I put a lock in de/compression objects themselves, then
on a multiprocessor machine, multiple de/compression objects could run
on multiple processors.  However, this adds some thorny (to me) issues
of different objects size on threaded installs and unthreaded installs --
I'm not clear how bad this is ;) -- as well as the evil bugaboo of code
complexity.

I would appreciate comments on these tradeoffs.  You can see a working
but untested thread-safe thread-swapping zlibmodule.c at

	http://chabry.caltech.edu/~t/transfer/zlibmodule.c

It is based off the current CVS tree, but the zlibmodule does not appear to
have changed at all within the last several releases, so it should work
just fine with e.g. a 2.1 distribution.

--titus

P.S. Thanks to Martin Loewis for taking a look at my first patch and pointing
out that zlib itself might not be threadsafe! ;).



More information about the Python-list mailing list