Recommendations in terms of threading, multi-threading and/or asynchronous processes/programming? - Sent Mail - Mozilla Thunderbird

jacob kruger jacob.kruger.work at gmail.com
Sun Jan 8 06:49:38 EST 2023


Ok, the specific usage case right now is that I need to set up a process 
pulling contents of e-mail messages from an IMAP protocol mail server, 
which I then populate into a postgresql database, and, since this is the 
inbox of a relatively large-scale CRM/support system, there are 
currently over 2.5 million e-mails in the inbox, but, it can grow by 
over 50000 per day.


I already have the basic process operating, using imap_tools, but, 
wanted to enable you to query the process during run-time, without 
needing to either check logs, or query the database itself while it is 
on-the-go - even if this is just for initial population time-period, 
since later on I will just set up code to run under a form of cron job, 
or handling time-based repeats itself on a separate machine.


Also wanted to offer the ability to either pause, or terminate processes 
while it's busy batch processing large chunks of e-mail messages - 
either send a message to the thread, or set a global variable to tell it 
to end the run after the current process item has finished off, just in 
case.


So, I think that for now, threading is probably the simplest to look into.


Later on, was also considering forms of low-level monitoring for UI 
elements, but, this is not really related to initial task, but, could 
almost relate to forms of non-visual gaming interfaces, for blind/VI 
individuals - I am myself 100% blind, but, that's not really relevant in 
this context.


Stay well


Jacob Kruger
+2782 413 4791
"Resistance is futile...but, acceptance is versatile..."


On 2023/01/06 21:19, Chris Angelico wrote:
> On Sat, 7 Jan 2023 at 04:54, jacob kruger <jacob.kruger.work at gmail.com> wrote:
>> I am just trying to make up my mind with regards to what I should look
>> into working with/making use of in terms of what have put in subject line?
>>
>>
>> As in, if want to be able to trigger multiple/various threads/processes
>> to run in the background, possibly monitoring their states, either via
>> interface, or via global variables, but, possibly while processing other
>> forms of user interaction via the normal/main process, what would be
>> recommended?
>>
> Any. All. Whatever suits your purpose.
>
> They all have different goals, different tradeoffs. Threads are great
> for I/O bound operations; they're easy to work with (especially in
> Python), behave pretty much like just having multiple things running
> concurrently, and generally are the easiest to use. But you'll run
> into limits as your thread count climbs (with a simple test, I started
> seeing delays at about 10,000 threads, with more serious problems at
> 100,000), so it's not well-suited for huge scaling. Also, only one
> thread at a time can run Python code, which limits them to I/O-bound
> tasks like networking.
>
> Multiple processes take a lot more management. You have to carefully
> define your communication channels (for instance, a
> multiprocessing.Queue() to collect results), but they can do CPU-bound
> tasks in parallel. So multiprocessing is a good way to saturate all of
> your CPU cores. Big downsides include it being much harder to share
> information between the processes, and much MUCH higher resource usage
> than threads (with the same test as the above, I ran into limitations
> at just over 500 processes - way fewer than the 10,000 threads!).
>
> Asynchronous I/O runs a single thread in a single process. So like
> multithreading, it's only good for I/O bound tasks like networking.
> It's harder to work with, though, since you have to be very careful to
> include proper await points, and you can stall out the entire event
> loop with one mistake (common culprits being synchronous disk I/O, and
> gethostbyname). But the upside is that you get near-infinite tasks,
> basically just limited by available memory (or other resources).
>
> Use whichever one is right for your needs.
>
> ChrisA


More information about the Python-list mailing list