Long running process - how to speed up?

Avi Gross avigross at verizon.net
Sat Feb 19 16:44:38 EST 2022


Indeed not a clear request. Timing is everything but there are times ...
For many purposes, people may read in the entire CSV at a gulp into some data structure like a pandas DataFrame. The file is then closed and any processing done later does whatever you want.
Of course you can easily read on line at a time in Python and parse it by comma and any other processing and act on one row at a time or in small batches so you never need huge amounts of memory. But using other methods to read in the entire set of data is often better optimized and faster, and being able to do some things with the data is faster if done in vectorized fashion using add-ons like numpy and pandas. We have no idea what is being used and none of this explains a need to use some form of sleep.
Multi-processing helps only if you can make steps in the processing run in parallel without interfering with each other or making things happen out of order. Yes, you could read in the data and assign say 10,000 rows to a thread to process and then get more and assign, if done quite carefully. The results might need to be carefully combined and any shared variables might need locks and so on. Not necessarily worth it if the data is not too large and the calculations are small.

And it remains unclear where you want to sleep or why. Parallelism can be important if the sleep is to wait for the user to respond to something while processing continues in the background. Is it possible that whatever you are calling to do processing has some kind of sleep within it and you may be calling it as often as per row? In that case, ask why it does that and can you avoid that? Yes, running in parallel may let you move forward but again, it has to be done carefully and having thousands of processes sleeping at the same time may be worse!
I note badly defined questions get horrible answers. Mine included.
-----Original Message-----
From: Alan Gauld <learn2program at gmail.com>
To: python-list at python.org
Sent: Sat, Feb 19, 2022 7:33 am
Subject: Fwd: Re: Long running process - how to speed up?

On 19/02/2022 11:28, Shaozhong SHI wrote:

> I have a cvs file of 932956 row 

That's not a lot in modern computing terms.

> and have to have time.sleep in a Python
> script. 

Why? Is it a requirement by your customer? Your manager?
time.sleep() is not usually helpful if you want to do
things quickly.

> It takes a long time to process.

What is a "long time"? minutes? hours? days? weeks?

It should take a million times as long as it takes to
process one row. But you have given no clue what you
are doing in each row.
- reading a database?
- reading from the network? or the internet?
- writing to a database? or the internet?
- performing highly complex math operations?

Or perhaps the processing load is in analyzing the totality
of the data after reading it all? A very different type
of problem. But we just don't know.

All of these factors will affect performance.

> How can I speed up the processing? 

It all depends on the processing.
You could try profiling your code to see where the time is spent.

> Can I do multi-processing?

Of course. But there is no guarantee that will speed things
up if there is a bottleneck on a single resource somewhere.
But it might be possible to divide and conquer and get better
speed. It all depends on what you are doing. We can't tell.

We cannot answer such a vague question with any specific
solution.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

-- 
https://mail.python.org/mailman/listinfo/python-list


More information about the Python-list mailing list