Long running process - how to speed up?

Avi Gross avigross at verizon.net
Sun Feb 20 13:40:33 EST 2022


dAVId,
I would like to assume that the processing needed is quite a bit more than calculating the square of each X.
But as we are getting negligible information on anything useful, why continue interacting. 
I am trying to imagin a scenario with a million rows of sorts in a CSV (typically not used for a single column of data, as that tends not to use commas) where you read in the data and want to generate just the square of each number in what may be a disorganized order as threads are not guaranteed to do anything but interleave!
Parallelism can be used in cases where the data is segmented properly and the algorithm adjusted to fit the needs. But the overhead can be substantial. For the  trivial task mentioned, which I have to hope is not the actual task, you can get quite a decent speed by reading it into a numpy data structure and using a vectorized way to produce the squares and simply print that. 
The original question here is turning out to be mysterious as it began by asking how to speed up some slow process not yet explained and some mumbling about calling a sleep function. I note some parallel algorithms require a variant of that in that some parts must wait for other parts to complete and arrange to be dormant till signaled or schedule themselves to be woken regularly and check if things are ready for them to resume. Sleeping is a very common occurence in systems that are time-shared.


-----Original Message-----
From: Shaozhong SHI <shishaozhong at gmail.com>
To: Mats Wichmann <mats at wichmann.us>
Cc: python-list at python.org
Sent: Sun, Feb 20, 2022 1:05 pm
Subject: Re: Long running process - how to speed up?

On Sat, 19 Feb 2022 at 19:44, Mats Wichmann <mats at wichmann.us> wrote:

> On 2/19/22 05:09, Shaozhong SHI wrote:
> > Can it be divided into several processes?
> > Regards,
> > David
>
> The answer is: "maybe".  Multiprocessing doesn't happen for free, you
> have to figure out how to divide the task up, requiring thought and
> effort. We can't guess to what extent the problem you have is amenable
> to multiprocessing.
>
> Google for "dataframe" and "multiprocessing" and you should get some
> hits (in my somewhat limited experience in this area, people usually
> load the csv data into Pandas before they get started working with it).
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list


I am trying this approach,

import multiprocessing as mp

def my_func(x):
  print(x**x)

def main():
  pool = mp.Pool(mp.cpu_count())
  result = pool.map(my_func, [4,2,3])

if __name__ == "__main__":
  main()

I modified the script and set off a test run.

However, I have no idea whether this approach will be faster than
conventional approach.

Any one has idea?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


More information about the Python-list mailing list