Problem with writing fast UDP server

Greg Copeland gtcopeland at gmail.com
Fri Nov 21 00:49:11 EST 2008


On Nov 20, 9:03 am, Krzysztof Retel <Krzysztof.Re... at googlemail.com>
wrote:
> Hi guys,
>
> I am struggling writing fast UDP server. It has to handle around 10000
> UDP packets per second. I started building that with non blocking
> socket and threads. Unfortunately my approach does not work at all.
> I wrote a simple case test: client and server. The client sends 2200
> packets within 0.137447118759 secs. The tcpdump received 2189 packets,
> which is not bad at all.
> But the server only handles 700 -- 870 packets, when it is non-
> blocking, and only 670 – 700 received with blocking sockets.
> The client and the server are working within the same local network
> and tcpdump shows pretty correct amount of packets received.
>
> I included a bit of the code of the UDP server.
>
> class PacketReceive(threading.Thread):
>     def __init__(self, tname, socket, queue):
>         self._tname = tname
>         self._socket = socket
>         self._queue = queue
>         threading.Thread.__init__(self, name=self._tname)
>
>     def run(self):
>         print 'Started thread: ', self.getName()
>         cnt = 1
>         cnt_msgs = 0
>         while True:
>             try:
>                 data = self._socket.recv(512)
>                 msg = data
>                 cnt_msgs += 1
>                 total += 1
>                 # self._queue.put(msg)
>                 print  'thread: %s, cnt_msgs: %d' % (self.getName(),
> cnt_msgs)
>             except:
>                 pass
>
> I was also using Queue, but this didn't help neither.
> Any idea what I am doing wrong?
>
> I was reading that Python socket modules was causing some delays with
> TCP server. They recomended to set up  socket option for nondelays:
> "sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) ". I couldn't find any
> similar option for UDP type sockets.
> Is there anything I have to change in socket options to make it
> working faster?
> Why the server can't process all incomming packets? Is there a bug in
> the socket layer? btw. I am using Python 2.5 on Ubuntu 8.10.
>
> Cheers
> K

First and foremost, you are not being realistic here. Attempting to
squeeze 10,000 packets per second out of 10Mb/s (assumed) Ethernet is
not realistic. The maximum theoretical limit is 14,880 frames per
second, and that assumes each frame is only 84 bytes per frame, making
it useless for data transport. Using your numbers, each frame requires
(90B + 84B) 174B, which works out to be a theoretical maximum of ~7200
frames per second. These are obviously some rough numbers but I
believe you get the point. It's late here, so I'll double check my
numbers tomorrow.

In your case, you would not want to use TCP_NODELAY, even if you were
to use TCP, as it would actually limit your throughput. UDP does not
have such an option because each datagram is an ethernet frame - which
is not true for TCP as TCP is a stream. In this case, use of TCP may
significantly reduce the number of frames required for transport -
assuming TCP_NODELAY is NOT used. If you want to increase your
throughput, use larger datagrams. If you are on a reliable connection,
which we can safely assume since you are currently using UDP, use of
TCP without the use of TCP_NODELAY may yield better performance
because of its buffering strategy.

Assuming you are using 10Mb ethernet, you are nearing its frame-
saturation limits. If you are using 100Mb ethernet, you'll obviously
have a lot more elbow room but not nearly as much as one would hope
because 100Mb is only possible when frames which are completely
filled. It's been a while since I last looked at 100Mb numbers, but
it's not likely most people will see numbers near its theoretical
limits simply because that number has so many caveats associated with
it - and small frames are its nemesis. Since you are using very small
datagrams, you are wasting a lot of potential throughput. And if you
have other computers on your network, the situation is made yet more
difficult. Additionally, many switches and/or routes also have
bandwidth limits which may or may not pose a wall for your
application. And to make matters worse, you are allocating lots of
buffers (4K) to send/receive 90 bytes of data, creating yet more work
for your computer.

Options to try:
See how TCP measures up for you
Attempt to place multiple data objects within a single datagram,
thereby optimizing available ethernet bandwidth
You didn't say if you are CPU-bound, but you are creating a tuple and
appending it to a list on every datagram. You may find allocating
smaller buffers and optimizing your history accounting may help if
you're CPU-bound.
Don't forget, localhost does not suffer from frame limits - it's
basically testing your memory/bus speed
If this is for local use only, considering using a different IPC
mechanism - unix domain sockets or memory mapped files



More information about the Python-list mailing list