PyApache memory leak?

Thu Apr 20 14:26:24 EDT 2000

[[I tried sending the following to PyApache's author; the email bounced
with "too many hops." I tried subscribing to py-apache-devel, which has
not seen a post in four months, and never got a reply. I post here in
the hopes that someone will get it in the right hands.]]

We just went live with a large, Python-driven web site (RedHat 6.1,
Apache 1.3.9 Red-Hat-Secure/3.1 mod_ssl/2.4.10 OpenSSL/0.9.4). We don't
have our log analysis software up and running yet, but we are getting
thousands of cgi page views a day (maybe low tens of thousands, not
hundreds of thousands). Each page view in turn involves a suite of
Python modules, both standard and written by us. All of ours are plain
Python (no C or Java).

We rolled in PyApache; the guy who did it is gone but the tar ball is
labelled 4.19 so I assume that's the version. Our beta testing was very
light; orders of magnitude lighter than the hits the site has gotten
since going live.

About two days and six hours after going live, the server froze.
Behavior was consistent with memory filling up--it was fine, then
suddenly it was slow, then within seconds you couldn't even ping it. We
restarted the machine, apologized to our clients, and tried to figure
out what was happening. There was some garbage written to one of the
Apache custom logs (we're running three virtual domains that share a
suite of Python cgis), but we couldn't tell what the problem was. A
little more than two days later, it crashed again with the same
behavior: everything fine--hey, my term window is sluggish--frozen.
Still no clue. Two days and roughly six hours after that, it froze a
third time. Then we started looking more carefully at the logs. They
read:

[many normal requests]
Normal request
@@@@line of garbage@@@@
Normal request
Normal request
Normal request
Normal request
[blank line]
[restart]

Our theory is this: since only Apache writes to the Apache log, one
Apache process was generating garbage seconds before the crash. A few
other processes had time to make an entry, then the machine froze. Ergo,
it was probably the garbage-writing process that took the machine out.
Apache 1.3.9 is used by millions of computers and RedHat Secure server
is used by tens of thousands (?), so it probably wasn't a bug in them.
The only other thing we've done to the standard distribution is add
PyApache.

We commented out PyApache in httpd.conf. Two days and six hours
later...the server did not freeze. We've been up for 11 days and
counting and that was the only change we made. I freely acknowledge that
this is not definitive proof of a showstopper in PyApache, but I am not
up to reading thousands of lines of C to pursue other theories.

I apologize for not giving more details, but we didn't know what else to
look at and as this is a production server I can't exactly turn PyApache
back on so I can gather better bug tracking info. I can be reached at
sam at ddmweb dot com if anyone has questions.

Cheers,
Sam

Sent via Deja.com http://www.deja.com/
Before you buy.