Parallel(?) programming with python

Mon Aug 8 13:56:02 EDT 2022

On Mon, 8 Aug 2022 12:47:26 +0200, Andreas Croci <andrea.croci at gmx.de>
declaimed the following:

>tI would like to write a program, that reads from the network a fixed 
>amount of bytes and appends them to a list. This should happen once a 
>second.
>

	Ignoring leap seconds, there are 86400 seconds in a day -- how many
bytes are you planning to read each second?

	Maybe more important? Is this a constant network connection feeding you
bytes (in which case the bytes available to read will be controlled by the
sender -- which may be sending continuously and building up a back log if
you don't empty the stream. Or are you planning to make a socket
connection, read n-bytes, close socket? 

>Another part of the program should take the list, as it has been filled 
>so far, every 6 hours or so, and do some computations on the data (a FFT).
>

	"6 hours or so"? That leaves one open to all sorts of variable timing.
In either event, a 6 hour interval is more suited to a process started by a
cron job (Linux/Unix) or Task Scheduler (Windows). Having a thread sleep
for 6 hours means no safeguard if the parent process should die at some
point (and if you are keeping the data in an internal list, you lose all
that data too)

>Every so often (say once a week) the list should be saved to a file, 

	This REQUIRES the process to not fail at any point, nor any system
restarts, etc. And (see prior paragraphs) how much data are you
accumulating. In one week you have 604800 "reads". If you are reading 10
bytes each time, that makes 6MB of data you could potentially lose (on most
modern hardware, 6MB is not a memory concern... Even 32-bit OS should be
able to find space for 600MB of data...).

	Much better would be to write the file as you read each chunk. If the
file is configured right, a separate process should be able to do read-only
processing of the file even while the write process is on-going. OR, you
attempt an open/write/close cycle which could be blocked while your FFT is
processing -- you'd have to detect that situation and buffer the read data
until you get a subsequent successful open, at which time you'd write all
the backlog data.

	Or you could even have your FFT process copy the data to the long term
file, while the write process just starts a new file when it finds itself
blocked (and the FFT deletes the file it was reading).

>shorthened in the front by so many items, and filled further with the 
>data coming fom the network. After the first saving of the whole list, 
>only the new part (the data that have come since the last saving) should 
>be appended to the file. A timestamp is in the data, so it's easy to say 
>what is new and what was already there.
>

	Personally, this sounds more suited for something like SQLite3...
Insert new records as the data is read, with timestamps. FFT process
selects records based upon last data ID (that it processed previously) to
end of new data. SQLite3 database IS the long-term storage. Might need a
second table to hold the FFT process "last data ID" so on start up it can
determine where to begin.

>I'm not sure how to do this properly: can I write a part of a program 
>that keeps doing its job (appending data to the list once every second) 
>while another part computes something on the data of the same list, 
>ignoring the new data being written?
>
	Well, if you really want ONE program -- you'll likely be looking at the
Threading module (I don't do "async", and your task doesn't seem suited for
async type call backs -- one thread that does the fetching of data, and a
second that does the FFT processing, which will be sleeping most of the
time).

	But either way, I'd suggest not keeping the data in an internal list;
use some RDBM to keep the long-term data, accumulating it as you fetch it,
and letting the FFT read from the database for its processing.

-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
	wlfraed at ix.netcom.com    http://wlfraed.microdiversity.freeddns.org/