ulimit on open sockets ?

Alex Martelli aleax at mac.com
Sat Apr 14 11:39:17 EDT 2007


Maxim Veksler <hq4ever at gmail.com> wrote:
   ...
> Thank you. I'm attaching the full code so far for reference, sadly it
> still doesn't work. It seems that select.select gets it's count of
> fd's not from the amount passed to it by the sub_list but from the
> kernel (or whatever) count for the process; The main issue here is

It's not a problem of COUNT of FD's, i.e., how many you're passing to
select; the problem is the value of the _highest_ number you can pass.
It's an API-level limitation, not an issue with Python per se: the
select API takes a "bit vector" of N bits, representing a set of FDs in
that way, and N is fixed at kernel-compilation time (normally to 1024).

The poll system call does not have this particular limitation, which is
why select.poll may be better for you.

Moreover, your code has other performance problems:


> while 1:
>     for select_cap_sockets in slice_by_fd_limit(all_sockets):
>         ready_to_read, ready_to_write, in_error =
> select.select(select_cap_sockets, [], [], 0)
>         for nb_active_socket in all_sockets:
>             if nb_active_socket in ready_to_read:

A small issue is with the last two lines -- instead of looping directly
on the small "ready-to-read" list, you're looping on the large
all_sockets one and looking each up in the small list -- that's just
throwing performance out of the window, and adding complexity, for no
benefit whatsoever.

The big issue is that you are "ceaselessly polling".  If no socket is
ready to read, you force select to return immediately anyway, and
basically call select at once afterwards.  You churn on the CPU without
surcease, using 100% of it, hogging it for this "busy wait", possibly to
the point of crowding out the kernel from some of the CPU time it needs
to do useful work in the TCP-IP stack.  Busy-wait is a bad thing...
never call select with a timeout of 0 in a tight loop. This
recommendation also applies to the polling-object that you can build
with select.poll, and any other situation where you're waiting for
another thread or process to deliver some data -- ideally you should
wait in a blocking way, if that's unfeasible at least make sure you're
letting some time pass between such calls, by using small but non-0
timeout (or even by inserting calls to time.sleep if that's what it
takes).

The risk of such "antipatterns" is a good reason why it would be better
to use a well-designed, well-coded, well-debugged existing framework,
such as Twisted, rather than roll your own, btw.  With twisted, you can
choose among many appropriate implementations of "reactor" (the key
design pattern for async prorgramming) and activate the one that is most
suitable for your needs (including, e.g., one based on epoll, which
gives better performance than poll on suitable operating systems).

If you're adamant on "rolling your own", though, you can find a Python
epoll module at <http://cheeseshop.python.org/pypi/pyepoll/0.2> (it's
said to be in alpha status, though; I believe there are other such
modules around, but pyepoll seems to be the only one on Cheese Shop).


Alex



More information about the Python-list mailing list