Simple TCP proxy

Morten W. Petersen morphex at gmail.com
Fri Jul 29 14:54:02 EDT 2022


OK.

Well, I've worked with web hosting in the past, and proxies like squid were
used to lessen the load on dynamic backends.  There was also a website
opensourcearticles.com that we had with Firefox, Thunderbird articles etc.
that got quite a bit of traffic.

IIRC, that website was mostly static with some dynamic bits and heavily
cached by squid.

Most websites don't get a lot of traffic though, and don't have a big
budget for "website system administration".  So maybe that's where I'm
partly going with this, just making a proxy that can be put in front and
deal with a lot of common situations, in a reasonably good way.

If I run into problems with threads that can't be managed, then a switch to
something like the queue_manager function which has data and then functions
that manage the data and connections is an option.

-Morten

On Fri, Jul 29, 2022 at 12:11 AM Chris Angelico <rosuav at gmail.com> wrote:

> On Fri, 29 Jul 2022 at 07:24, Morten W. Petersen <morphex at gmail.com>
> wrote:
> >
> > Forwarding to the list as well.
> >
> > ---------- Forwarded message ---------
> > From: Morten W. Petersen <morphex at gmail.com>
> > Date: Thu, Jul 28, 2022 at 11:22 PM
> > Subject: Re: Simple TCP proxy
> > To: Chris Angelico <rosuav at gmail.com>
> >
> >
> > Well, an increase from 0.1 seconds to 0.2 seconds on "polling" in each
> > thread whether or not the connection should become active doesn't seem
> like
> > a big deal.
>
> Maybe, but polling *at all* is the problem here. It shouldn't be
> hammering the other server. You'll quickly find that there are limits
> that simply shouldn't exist, because every connection is trying to
> check to see if it's active now. This is *completely unnecessary*.
> I'll reiterate the advice given earlier in this thread (of
> conversation): Look into the tools available for thread (of execution)
> synchronization, such as mutexes (in Python, threading.Lock) and
> events (in Python, threading.Condition). A poll interval enforces a
> delay before the thread notices that it's active, AND causes inactive
> threads to consume CPU, neither of which is a good thing.
>
> > And there's also some point where it is pointless to accept more
> > connections, and where maybe remedies like accepting known good IPs,
> > blocking IPs / IP blocks with more than 3 connections etc. should be
> > considered.
>
> Firewalling is its own science. Blocking IPs with too many
> simultaneous connections should be decided administratively, not
> because your proxy can't handle enough connections.
>
> > I think I'll be getting closer than most applications to an eventual
> > ceiling for what Python can handle of threads, and that's interesting and
> > could be beneficial for Python as well.
>
> Here's a quick demo of the cost of threads when they're all blocked on
> something.
>
> >>> import threading
> >>> finish = threading.Condition()
> >>> def thrd(cond):
> ...     with cond: cond.wait()
> ...
> >>> threading.active_count() # Main thread only
> 1
> >>> import time
> >>> def spawn(n):
> ...     start = time.monotonic()
> ...     for _ in range(n):
> ...             t = threading.Thread(target=thrd, args=(finish,))
> ...             t.start()
> ...     print("Spawned", n, "threads in", time.monotonic() - start,
> "seconds")
> ...
> >>> spawn(10000)
> Spawned 10000 threads in 7.548425202025101 seconds
> >>> threading.active_count()
> 10001
> >>> with finish: finish.notify_all()
> ...
> >>> threading.active_count()
> 1
>
> It takes a bit of time to start ten thousand threads, but after that,
> the system is completely idle again until I notify them all and they
> shut down.
>
> (Interestingly, it takes four times as long to start 20,000 threads,
> suggesting that something in thread spawning has O(n²) cost. Still,
> even that leaves the system completely idle once it's done spawning
> them.)
>
> If your proxy can handle 20,000 threads, I would be astonished. And
> this isn't even close to a thread limit.
>
> Obviously the cost is different if the threads are all doing things,
> but if you have thousands of active socket connections, you'll start
> finding that there are limitations in quite a few places, depending on
> how much traffic is going through them. Ultimately, yes, you will find
> that threads restrict you and asynchronous I/O is the only option; but
> you can take threads a fairly long way before they are the limiting
> factor.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>


-- 
I am https://leavingnorway.info
Videos at https://www.youtube.com/user/TheBlogologue
Twittering at http://twitter.com/blogologue
Blogging at http://blogologue.com
Playing music at https://soundcloud.com/morten-w-petersen
Also playing music and podcasting here:
http://www.mixcloud.com/morten-w-petersen/
On Google+ here https://plus.google.com/107781930037068750156
On Instagram at https://instagram.com/morphexx/


More information about the Python-list mailing list