[medusa] Re: tweaks to asyn{core,chat}.py

Sam Rushing rushing@n...
Fri, 19 Nov 1999 19:31:24 -0800 (PST)


Amos Latteier writes:
> One problem Zope has with asyncore results from using worker
> threads in addition to a medusa thread. There are problems when the
> worker thread is ready to go, but the medusa thread is sitting in a
> select call, which potentially can take up to 30 secs to return. We
> get around this by using a select trigger to wake up select, but we
> would like to get rid of the need for a select trigger.

Lowering the timeout value just aggravates the CPU-abuse
problem... imagine if you had a timeout of zero, then the CPU would
stay pegged even if the server were completely idle.

Why don't you like the trigger?

For the uninitiated, the 'trigger' is a socket/descriptor that is
always in the readable set for select. When select() is otherwise
'hung' waiting for a timeout, another thread can 'pull the trigger'
and wake up the select call. On Unix, signals can also force select()
to return, but it's not portable.

> One way to achieve this is to reduce the timeout on the select call
> to a small enough value that the worker threads can wait for a
> timeout without being overly inconvenienced. This would require the
> select loop being adequately optimized.

Without a redesign that eliminated the calls to readable() and
writable() I think we're stuck. Curiously, the cleanest 'redesign'
that achieves this effect is to go to coroutines/fibers. But that's a
whole 'nother paradigm... [coroutines combined with completion ports
would be the real killer]

> One optimization Jim suggested is to pass integers to the select
> loop, not descriptor objects. I think that this is what you are
> suggesting as 'fdcache' optimization.

Waaay back (maybe last year) I did something similar, but I used a
separate descriptor-map dictionary. It worked well, but would
sometimes run into a problem where the one map wasn't in sync with the 
other. Now they're the same map... much cleaner.

But there are other areas that need tweaking, the poll() overhead is
no long on top of the list. I have modified a python here to do a
type of statistical profiling (every <checkinterval> VM insns a
dictionary of {<code-object>:<count>} is updated). Here's what made
me look at refill_buffer():

[...]
709 <code object recv at 80f8f48, file "../asyncore.py", line 265>
713 <code object found_terminator at 80e8d60, file "test_lb.py", line 35>
748 <code object collect_incoming_data at 81031a8, file "test_lb.py", line 31>
751 <code object handle_read_event at 80f9768, file "../asyncore.py", line 299>
870 <code object pop at 80dbfd0, file "../asynchat.py", line 251>
948 <code object found_terminator at 80f3c68, file "test_lb.py", line 92>
1019 <code object readable at 80f3788, file "../asynchat.py", line 156>
1421 <code object __len__ at 80f3590, file "../asynchat.py", line 242>
1582 <code object writable at 8113d90, file "../asynchat.py", line 160>
1739 <code object more at 81123d0, file "../asynchat.py", line 225>
3258 <code object initiate_send at 8101460, file "../asynchat.py", line 195>
4205 <code object poll at 80f5e08, file "../asyncore.py", line 54>
4211 <code object handle_read at 810e260, file "../asynchat.py", line 77>
4970 <code object refill_buffer at 81143a8, file "../asynchat.py", line 170>

> Another optimization would be to have a system of re-entering the
> select call immediately after returning if nothing is ready--rather
> than checking the results and rebuilding the arguments to the
> select call.

This is a really good idea! it's getting into an interesting area, of
trying to be smarter about scheduling the r/w sets. I think there are
some things like that to be found here:

http://www.kegel.com/c10k.html

A problem I see is for users of the event_loop class, which supports
timers. It would be possible (nay, likely) for a timer to modify the
readable/writable condition for one or more objects, which would break
the above fix.

> > I noticed that ftp_server doesn't have a zombie
> >timeout is there an obvious reason why not?
> 
> I've wondered about this too.

Lazy Sam. Every time I go to add it, I think - "there should be a
general facility for this" and of course that allows me to put it
off for a while. Lather. Rinse. Repeat.

-Sam