Fork, Threading, and Select madness. Please help!

Gordon McMillan gmcm at hypernet.com
Thu Jul 22 23:45:58 EDT 1999


Jesse D. Sightler wrote:

> Ok, I have been working on a website which is now available at
> http://www.biddin.com/ that performs a search of multiple auction
> sites all at once, and then returns the results.  The problem is
> that it is very slow, and it appears that the primary bottleneck is
> in waiting on the search engines at the various sites to begin
> reading data.
> 
> Well, this makes sense because the code to implement the search is
> currently pure sequential with no threading or non-blocking IO
> whatsoever.  
> 
> What I'm getting around to asking (slowly<g>) is, what is the best
> way to implement an algorithm that can retrieve the data from ALL 3
> sites simultaneously, so that the code is never blocked up waiting
> on one particular bad site?  

In theory, select (multiplexing) is the best solution, especially 
since you are just gathering data. But this might well mean trashing 
all of your existing code. I'd recommend using asyncore if you're 
doing this from scratch.

If you're using one of the higher level protocol (urllib, 
httplib) modules, a quick glance (I'm no expert on these guys) says 
you'll have to thread or use subprocesses. Multiplexing usually means 
turning your logic inside out (you're no longer driven by meaningful 
exchanges, but by drips and drabs of data appearing in network 
buffers). So the transformation from blocking sockets to multiplexed 
non-blocking sockets is major surgery.

- Gordon




More information about the Python-list mailing list