Problem with writing fast UDP server

Krzysztof Retel Krzysztof.Retel at googlemail.com
Fri Nov 21 11:14:19 EST 2008


On Nov 21, 5:49 am, Greg Copeland <gtcopel... at gmail.com> wrote:
> On Nov 20, 9:03 am, Krzysztof Retel <Krzysztof.Re... at googlemail.com>
> wrote:
>
>
>
> > Hi guys,
>
> > I am struggling writing fast UDP server. It has to handle around 10000
> > UDP packets per second. I started building that with non blocking
> > socket and threads. Unfortunately my approach does not work at all.
> > I wrote a simple case test: client and server. The client sends 2200
> > packets within 0.137447118759 secs. The tcpdump received 2189 packets,
> > which is not bad at all.
> > But the server only handles 700 -- 870 packets, when it is non-
> > blocking, and only 670 – 700 received with blocking sockets.
> > The client and the server are working within the same local network
> > and tcpdump shows pretty correct amount of packets received.
>
> > I included a bit of the code of the UDP server.
>
> > class PacketReceive(threading.Thread):
> >     def __init__(self, tname, socket, queue):
> >         self._tname = tname
> >         self._socket = socket
> >         self._queue = queue
> >         threading.Thread.__init__(self, name=self._tname)
>
> >     def run(self):
> >         print 'Started thread: ', self.getName()
> >         cnt = 1
> >         cnt_msgs = 0
> >         while True:
> >             try:
> >                 data = self._socket.recv(512)
> >                 msg = data
> >                 cnt_msgs += 1
> >                 total += 1
> >                 # self._queue.put(msg)
> >                 print  'thread: %s, cnt_msgs: %d' % (self.getName(),
> > cnt_msgs)
> >             except:
> >                 pass
>
> > I was also using Queue, but this didn't help neither.
> > Any idea what I am doing wrong?
>
> > I was reading that Python socket modules was causing some delays with
> > TCP server. They recomended to set up  socket option for nondelays:
> > "sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) ". I couldn't find any
> > similar option for UDP type sockets.
> > Is there anything I have to change in socket options to make it
> > working faster?
> > Why the server can't process all incomming packets? Is there a bug in
> > the socket layer? btw. I am using Python 2.5 on Ubuntu 8.10.
>
> > Cheers
> > K
>
> First and foremost, you are not being realistic here. Attempting to
> squeeze 10,000 packets per second out of 10Mb/s (assumed) Ethernet is
> not realistic. The maximum theoretical limit is 14,880 frames per
> second, and that assumes each frame is only 84 bytes per frame, making
> it useless for data transport. Using your numbers, each frame requires
> (90B + 84B) 174B, which works out to be a theoretical maximum of ~7200
> frames per second. These are obviously some rough numbers but I
> believe you get the point. It's late here, so I'll double check my
> numbers tomorrow.
>
> In your case, you would not want to use TCP_NODELAY, even if you were
> to use TCP, as it would actually limit your throughput. UDP does not
> have such an option because each datagram is an ethernet frame - which
> is not true for TCP as TCP is a stream. In this case, use of TCP may
> significantly reduce the number of frames required for transport -
> assuming TCP_NODELAY is NOT used. If you want to increase your
> throughput, use larger datagrams. If you are on a reliable connection,
> which we can safely assume since you are currently using UDP, use of
> TCP without the use of TCP_NODELAY may yield better performance
> because of its buffering strategy.
>
> Assuming you are using 10Mb ethernet, you are nearing its frame-
> saturation limits. If you are using 100Mb ethernet, you'll obviously
> have a lot more elbow room but not nearly as much as one would hope
> because 100Mb is only possible when frames which are completely
> filled. It's been a while since I last looked at 100Mb numbers, but
> it's not likely most people will see numbers near its theoretical
> limits simply because that number has so many caveats associated with
> it - and small frames are its nemesis. Since you are using very small
> datagrams, you are wasting a lot of potential throughput. And if you
> have other computers on your network, the situation is made yet more
> difficult. Additionally, many switches and/or routes also have
> bandwidth limits which may or may not pose a wall for your
> application. And to make matters worse, you are allocating lots of
> buffers (4K) to send/receive 90 bytes of data, creating yet more work
> for your computer.
>
> Options to try:
> See how TCP measures up for you
> Attempt to place multiple data objects within a single datagram,
> thereby optimizing available ethernet bandwidth
> You didn't say if you are CPU-bound, but you are creating a tuple and
> appending it to a list on every datagram. You may find allocating
> smaller buffers and optimizing your history accounting may help if
> you're CPU-bound.
> Don't forget, localhost does not suffer from frame limits - it's
> basically testing your memory/bus speed
> If this is for local use only, considering using a different IPC
> mechanism - unix domain sockets or memory mapped files

Greg, thanks very much for your reply.
I am not sure what do you mean by CPU-bound? How can I find out if I
run it on CPU-bound?

May I also ask you for list of references about sockets and
networking? Just want to develop my knowledge regarding networking.

Cheers
K



More information about the Python-list mailing list