multiprocessing shows no benefit

Jason jasonhihn at gmail.com
Wed Oct 18 12:21:38 EDT 2017


On Wednesday, October 18, 2017 at 12:14:30 PM UTC-4, Ian wrote:
> On Wed, Oct 18, 2017 at 9:46 AM, Jason  wrote:
> > #When I change line19 to True to use the multiprocessing stuff it all slows down.
> >
> > from multiprocessing import Process, Manager, Pool, cpu_count
> > from timeit import default_timer as timer
> >
> > def f(a,b):
> >         return dict_words[a]-b
> 
> Since the computation is so simple my suspicion is that the run time
> is dominated by IPC, in other words the cost of sending objects back
> and forth outweighs the gains you get from parallelization.
> 
> What happens if you remove dict_words from the Manager and just pass
> dict_words[a] across instead of just a? Also, I'm not sure why
> dict_keys is a managed list to begin with since it only appears to be
> handled by the main process.

You can try that code by changing line 17 :-) It's 10 times faster.

Well given the many objects to be iterated over, I was hoping pool.map() would distribute them across the processors. So that each processor gets len(dict_words)/cpu_count() items to process. The actual computation is much longer than a single subtraction, currently I can process about 16k items per second single core.  My target is to get those 16k items processed in .25s.






More information about the Python-list mailing list