High performance IO on non-blocking sockets

Troels Walsted Hansen troels at thule.no
Sat Mar 15 08:15:55 EST 2003


Dave Brueck wrote:
> Yes - to achieve truly high performance (BTW - how high do you need?) you
> need to pay attention to what sort of writing you're doing. Is it
> HTTP-like traffic or a custom protocol? How many simultaneous connections
> do you need to support? Are they likely to be LAN-speed connections,
> DSL-speed, modem, or some mix?

XML-RPC over HTTP. You don't need to tell me that XML-RPC is unsuited 
for large payloads, I'm painfully aware of that fact. :)

> At the company I work for we have several different custom HTTP servers,
> and we saw huge performance gains when we started grouping the types of
> I/O according to size and acting on them differently. For example, in the
> hundreds of megabytes (or even half a megabyte) cases, it's likely that
> the data you're writing is coming off the disk. Our servers primarily run
> on Linux, so we created a tiny C extension module that calls the sendfile
> API and in cases where there's a large chunk of data coming off disk we
> call sendfile so that the data never even makes it to Python (or our
> process memory space, for that matter). On platforms without a sendfile C
> API the call gets routed to a simulated sendfile (all Python) instead.
> 
> Anyway, with sendfile we hit some crazy performance levels - a PII (<500
> MHz)  easily sustains 300 Mbps throughput for example for hundreds of
> simultaneous DSL-like connections, and a PIII (~900 MHz) has passed 1.5
> Gbps over the loopback adapter.

sendfile() is indeed great if your data is coming off a disk.

> For our work, it's quite unlikely that we _ever_ send out 1 byte of
> anything, but we do see lots of cases (like building HTTP response
> headers) where there's lots of little chunks. In those situations we build
> up a list of little strings, ''.join() them, and send them out as one
> chunk.
> 
> One idea we've considered but not pursued is using the buffer() objects to
> avoid the send-a-piece-then-copy-the-substring problem you identified.
> We haven't gone down that path too far yet because sendfile has helped
> immensely and we maintain our outgoing queues as lists of strings that we
> keep as a list until right before sending, at which time we combine enough
> of them to create a string large enough to fill the output buffer of the
> socket, but (hopefully) not too much more.

Another thing to watch out for is too many substrings on the list. You 
can quite easily fragment the address space of the process and prevent 
the malloc library from shrinking the address space when memory is 
freed. Some OSs seem more sensitive to this than others...

> Again, though, the approach to how you read the data can benefit if you
> can give hints on what you'll do with it.
> 
> For example, when we're proxying between two sockets we leave the data in
> a list of chunks because our sending code can use it in that form anyway.
> When we're receiving an upload, we don't really want a buffer the size of
> the entire upload in memory anyway because we're going to be tossing the
> data to disk.
> 
> Still though, I do wish there was a better way to do the receives because
> even with leaving the data in a chunked list our proxying is slow and the
> primary bottleneck appears to be the recv side of things.

Yeah, there's quite a bit of copying going on in recv() unfortunately.

A more optimal approach might include a modified socket.recv() that 
writes directly to a buffer object. The buffer object could have 
cStringIO semantics with support for pre-allocation hints (for the times 
when you know total_recv_size) and dynamic expansion through 
reallocation (for the times when you don't know total_recv_size).

> This is the approach we use, except that we never do the final ''.join
> (well, our framework doesn't, the application might if it makes sense)
> because as a list the data is in suitable form for writing to disk or
> handing off to the send code.

Unfortunately I need the complete data in order to parse and decode the 
XML-RPC payload. :(

-- 
Troels Walsted Hansen





More information about the Python-list mailing list