Mem "leak" w/ long-running network apps?

Fri Apr 18 00:37:44 EDT 2003

I've spent the last few weeks trying to track down a memory leak in a
long-running network app (an HTTP proxy for large-ish objects) and am
trying to find out if anyone else has encountered something similar.

I narrowed the problem down to the receiving side of the proxy, and today
in socketmodule.c I noticed that when you call sock.recv(n) a string of
size n is created and then resized after the recv to the actual size of
the data received. There's nothing wrong with that in and of itself, but
when I replaced it with the function below to do receives the ever-growing
memory problem went away (so now instead of calling sock.recv(amnt) I do a
specialmodule.recv(sock, amnt)). The function merely uses a static buffer
and then creates a Python string of just the size needed - the resize is
what got eliminated. I initially encountered the problem on Python 2.1.3
but then moved to 2.2.2 to use the gc module enhancements.

My questions are:

(1) has anybody else run into a similar problem before? This may not be
very common because the app is a little unusual in that it's long-running,
handles hundreds of concurrent connections, each connection is usually for
a large (tens to hundreds of megabytes) object, the data is all proxied
rather than being served off disk or generated by the app, and both the
upstream and downstream connections are generally fast (aggregate
throughput for the server is usually in the 100-200 Mbps range for a P3
900.

(2) Any ideas on why using the normal socket.recv resulted in ever-growing
memory use? I've spent ages using the gc module and other tools to track
down objects that should have been freed, cyclic references, etc., and
don't see any problems there and I don't think the memory is really being
leaked (in the C sense). Could it be a heavily fragmented heap or
something like that? I'd notice that after my tests ran for a long time
I'd stop them and memory usage would drop down after awhile by a few
megabytes, and upon starting my tests again (without restarting my app)
mem usage would drop down some more but not all the way down and then it
would gradually grow again to a new high-water mark, so that overnight my
process had hundreds of MB of RAM.  With my recv-replacement I'm holding
steady at about 30 MB, which is normal.

(3) Does anyone see any glaring errors in my function below? Seems to work
well enough. :)

Anyway, I don't think there's a bug in Python, and my function is
certainly not patch-worthy because it really works just for my use, and
now that my problem is gone it's mostly out of curiosity that I'm trying
to better understand why my problem is gone, but I'd appreciate any
insight or hints.

Thanks,
Dave

Here's the function. I can get away with using a single large buffer for
all my receives because while the server is a mixture of threading and
poll-based, all the I/O happens sequentially in one thread against sockets
that are known to be ready for I/O.

#define MAX_RECV_SIZE 1048576
static char RECV_BUFF[MAX_RECV_SIZE];

static PyObject *
recv_wrapper(PyObject *self, PyObject *args)
{
    PyObject *py_sock;
    PyObject *py_str;
    int sock;              /* Output socket */
    int len, n;

    if (!PyArg_ParseTuple(args, "Oi:recv", &py_sock, &len))
            return NULL;

    if (len < 0)
    {
      PyErr_SetString(PyExc_ValueError, "negative buffersize in recv");
      return NULL;
    }

    if (len > MAX_RECV_SIZE-1)
    {
      PyErr_SetString(PyExc_ValueError, "buffersize too large in recv");
      return NULL;
    }

    sock = PyObject_AsFileDescriptor(py_sock);
    Py_BEGIN_ALLOW_THREADS;
    n = recv(sock, RECV_BUFF, len, 0);
    Py_END_ALLOW_THREADS;
    if (n == -1)
      return PyErr_SetFromErrno(RecvError);

    py_str = PyString_FromStringAndSize(RECV_BUFF, n);
    if (py_str == NULL)
      return NULL;

    return py_str;
}