Fork, Threading, and Select madness. Please help!

Jesse D. Sightler jsight at mindspring.com
Sat Jul 24 16:45:03 EDT 1999


Thanks for your thoughtful reply.  As it turned out, threading wasn't
nearly as bad a solution as I thought it might be<0.5wink>.  I just
downloaded Python 1.52 and recompiled with "./configure --with-threads"
and everything worked out nicely on the FreeBSD box that my webserver
uses.  

Of course, doing this on my linux box resulted in something that causes
my kernel to blow up in flames a few hours after running the Python
interpretter with any threaded program.  But that's ok, at least it
works with my ISP.

Anyway, threads seem to be helping me make things faster, so right now
that is my solution.  The select idea is probably theoretically better,
but would be a big challenge to implement due to the architecture of the
code.<sigh>

Gordon McMillan wrote:
> 
> Jesse D. Sightler wrote:
> 
> > Ok, I have been working on a website which is now available at
> > http://www.biddin.com/ that performs a search of multiple auction
> > sites all at once, and then returns the results.  The problem is
> > that it is very slow, and it appears that the primary bottleneck is
> > in waiting on the search engines at the various sites to begin
> > reading data.
> >
> > Well, this makes sense because the code to implement the search is
> > currently pure sequential with no threading or non-blocking IO
> > whatsoever.
> >
> > What I'm getting around to asking (slowly<g>) is, what is the best
> > way to implement an algorithm that can retrieve the data from ALL 3
> > sites simultaneously, so that the code is never blocked up waiting
> > on one particular bad site?
> 
> In theory, select (multiplexing) is the best solution, especially
> since you are just gathering data. But this might well mean trashing
> all of your existing code. I'd recommend using asyncore if you're
> doing this from scratch.
> 
> If you're using one of the higher level protocol (urllib,
> httplib) modules, a quick glance (I'm no expert on these guys) says
> you'll have to thread or use subprocesses. Multiplexing usually means
> turning your logic inside out (you're no longer driven by meaningful
> exchanges, but by drips and drabs of data appearing in network
> buffers). So the transformation from blocking sockets to multiplexed
> non-blocking sockets is major surgery.
> 
> - Gordon




More information about the Python-list mailing list