Fork, Threading, and Select madness. Please help!
Gordon McMillan
gmcm at hypernet.com
Thu Jul 22 23:45:58 EDT 1999
Jesse D. Sightler wrote:
> Ok, I have been working on a website which is now available at
> http://www.biddin.com/ that performs a search of multiple auction
> sites all at once, and then returns the results. The problem is
> that it is very slow, and it appears that the primary bottleneck is
> in waiting on the search engines at the various sites to begin
> reading data.
>
> Well, this makes sense because the code to implement the search is
> currently pure sequential with no threading or non-blocking IO
> whatsoever.
>
> What I'm getting around to asking (slowly<g>) is, what is the best
> way to implement an algorithm that can retrieve the data from ALL 3
> sites simultaneously, so that the code is never blocked up waiting
> on one particular bad site?
In theory, select (multiplexing) is the best solution, especially
since you are just gathering data. But this might well mean trashing
all of your existing code. I'd recommend using asyncore if you're
doing this from scratch.
If you're using one of the higher level protocol (urllib,
httplib) modules, a quick glance (I'm no expert on these guys) says
you'll have to thread or use subprocesses. Multiplexing usually means
turning your logic inside out (you're no longer driven by meaningful
exchanges, but by drips and drabs of data appearing in network
buffers). So the transformation from blocking sockets to multiplexed
non-blocking sockets is major surgery.
- Gordon
More information about the Python-list
mailing list