multiprocessing problems

DoxaLogos doxalogos at gmail.com
Wed Jan 20 11:10:34 EST 2010


On Jan 19, 10:33 am, DoxaLogos <doxalo... at gmail.com> wrote:
> On Jan 19, 10:26 am, Adam Tauno Williams <awill... at opengroupware.us>
> wrote:
>
>
>
> > > I decided to play around with the multiprocessing module, and I'm
> > > having some strange side effects that I can't explain.  It makes me
> > > wonder if I'm just overlooking something obvious or not.  Basically, I
> > > have a script parses through a lot of files doing search and replace
> > > on key strings inside the file.  I decided the split the work up on
> > > multiple processes on each processor core (4 total).  I've tried many
> > > various ways doing this form using pool to calling out separate
> > > processes, but the result has been the same: computer crashes from
> > > endless process spawn.
>
> > Are you hitting a ulimit error?  The number of processes you can create
> > is probably limited.
>
> > TIP: close os.stdin on your subprocesses.
>
> > > Here's the guts of my latest incarnation.
> > > def ProcessBatch(files):
> > >     p = []
> > >     for file in files:
> > >         p.append(Process(target=ProcessFile,args=file))
> > >     for x in p:
> > >         x.start()
> > >     for x in p:
> > >         x.join()
> > >     p = []
> > >     return
> > > Now, the function calling ProcessBatch looks like this:
> > > def ReplaceIt(files):
> > >     processFiles = []
> > >     for replacefile in files:
> > >         if(CheckSkipFile(replacefile)):
> > >             processFiles.append(replacefile)
> > >             if(len(processFiles) == 4):
> > >                 ProcessBatch(processFiles)
> > >                 processFiles = []
> > >     #check for left over files once main loop is done and process them
> > >     if(len(processFiles) > 0):
> > >         ProcessBatch(processFiles)
>
> > According to this you will create files is sets of four, but an unknown
> > number of sets of four.
>
> What would be the proper way to only do a set of 4, stop, then do
> another set of 4?  I'm trying to only 4 files at time before doing
> another set of 4.

I found out my problems.  One thing I did was followed the test queue
example in the documentation, but the biggest problem turned out to be
a pool instantiated globally in my script was causing most of the
endless process spawn, even with the "if __name__ == "__main__":"
block.



More information about the Python-list mailing list