Simple TCP proxy

Thu Jul 28 18:08:33 EDT 2022

On Fri, 29 Jul 2022 at 07:24, Morten W. Petersen <morphex at gmail.com> wrote:
>
> Forwarding to the list as well.
>
> ---------- Forwarded message ---------
> From: Morten W. Petersen <morphex at gmail.com>
> Date: Thu, Jul 28, 2022 at 11:22 PM
> Subject: Re: Simple TCP proxy
> To: Chris Angelico <rosuav at gmail.com>
>
>
> Well, an increase from 0.1 seconds to 0.2 seconds on "polling" in each
> thread whether or not the connection should become active doesn't seem like
> a big deal.

Maybe, but polling *at all* is the problem here. It shouldn't be
hammering the other server. You'll quickly find that there are limits
that simply shouldn't exist, because every connection is trying to
check to see if it's active now. This is *completely unnecessary*.
I'll reiterate the advice given earlier in this thread (of
conversation): Look into the tools available for thread (of execution)
synchronization, such as mutexes (in Python, threading.Lock) and
events (in Python, threading.Condition). A poll interval enforces a
delay before the thread notices that it's active, AND causes inactive
threads to consume CPU, neither of which is a good thing.

> And there's also some point where it is pointless to accept more
> connections, and where maybe remedies like accepting known good IPs,
> blocking IPs / IP blocks with more than 3 connections etc. should be
> considered.

Firewalling is its own science. Blocking IPs with too many
simultaneous connections should be decided administratively, not
because your proxy can't handle enough connections.

> I think I'll be getting closer than most applications to an eventual
> ceiling for what Python can handle of threads, and that's interesting and
> could be beneficial for Python as well.

Here's a quick demo of the cost of threads when they're all blocked on
something.

>>> import threading
>>> finish = threading.Condition()
>>> def thrd(cond):
...     with cond: cond.wait()
...
>>> threading.active_count() # Main thread only
1
>>> import time
>>> def spawn(n):
...     start = time.monotonic()
...     for _ in range(n):
...             t = threading.Thread(target=thrd, args=(finish,))
...             t.start()
...     print("Spawned", n, "threads in", time.monotonic() - start, "seconds")
...
>>> spawn(10000)
Spawned 10000 threads in 7.548425202025101 seconds
>>> threading.active_count()
10001
>>> with finish: finish.notify_all()
...
>>> threading.active_count()
1

It takes a bit of time to start ten thousand threads, but after that,
the system is completely idle again until I notify them all and they
shut down.

(Interestingly, it takes four times as long to start 20,000 threads,
suggesting that something in thread spawning has O(n²) cost. Still,
even that leaves the system completely idle once it's done spawning
them.)

If your proxy can handle 20,000 threads, I would be astonished. And
this isn't even close to a thread limit.

Obviously the cost is different if the threads are all doing things,
but if you have thousands of active socket connections, you'll start
finding that there are limitations in quite a few places, depending on
how much traffic is going through them. Ultimately, yes, you will find
that threads restrict you and asynchronous I/O is the only option; but
you can take threads a fairly long way before they are the limiting
factor.

ChrisA