Code For Five Threads To Process Multiple Files?

tdahsu at gmail.com tdahsu at gmail.com
Fri May 23 10:25:10 EDT 2008


On May 23, 12:20 am, Dennis Lee Bieber <wlfr... at ix.netcom.com> wrote:
> On Thu, 22 May 2008 11:03:48 -0700 (PDT), tda... at gmail.com declaimed the
> following in comp.lang.python:
>
> > Ah, well, I didn't get any other responses, but here's what I've done:
>
>         Apparently the direct email from my work address did not get through
> (I don't have group posting ability from work).
>
> > loopCount = 0
> >                 for l in range(len(self.filesToProcess)):
> >                     threads = []
> >                     try:
>
> > threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount
> > +l])))
>
>         Python lists index from 0... So this will be 0+0, first entry in the
> file list
>
>
>
> > threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount
> > +2])))
>
>         This is 0+2, THIRD entry in the file list -- you've just skipped
> over the second entry...
>
> > threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount
> > +3])))
>
> > threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount
> > +4])))
>
> > threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount
> > +5])))
>
>         Very ugly... Also going to fail for other reasons... Consider:
>
> filestoprocess = [ 'file1', 'file2', 'file3' ]
> for jnk in range(len(filestoprocess)):  #this will loop three times!
>                                                                                 #jnk = 0, 1, 2
>
>         You proceed to create FIVE threads (or try to) when there are only
> THREE files... It will fail as soon as it tries loopCount+3 (fourth
> entry in a three element list)
>
> >                         msg = "Processing file...\n"
> >                         for thread in threads:
> >                             wx.CallAfter(self.textctrl03.write(msg),
> > thread.start())
>
>         Is this running as the main controller of some GUI? if so...
>
> >                         for thread in threads:
> >                             thread.join()
>
>         Your GUI will essentially freeze since it can't process events
> (including screen updates) until the entire function you are in returns
> to the event handler... But .join() blocks until the specified thread
> really finishes...
>
> >                         loopCount += 5
> >                     except IndexError:
> >                         pass
>
>         BAD style -- if you are going to trap an exception, you should do
> something with it... But then, the only reason you would GET this
> exception is because the preceding code is looping too many times
> relative to the number of files...
>
>         As shown, with three files, you will create the first thread (0) for
> first file, skip the second file creating the second thread (1) for the
> third file, and raise an exception on trying to create the third thread
> (2) when you try to access a fourth file in the list.  The exception
> will be raised -- SKIPPING over the thread.start() calls, and skipping
> the thread.join() calls. You then ignore the error, and go back to the
> start of the loop where the index is now "1"... AND reset the thread
> list, so threads 0&1 are forgotten, never started, never joined, garbage
> collected...
>
>         Again, you now create a thread (0) giving it the second file (since
> loopCount was never incremented, and the first thread is using loopCount
> + <loopindex>), create thread (1) giving it the third file, raise the
> exception... repeat
>
>
>
> > It works, and it works well.  It starts five threads, and processes
> > five files at a time.  (In the "self.processFiles" I read the whole
> > file into memory using readlines(), which works well.)
>
>         It only works as long as loopCount+5 is less than the number of
> files in the list... AND at that, it skips one file and double processes
> another...
>
> > Of course, now the wx.CallAfter function doesn't work... I get
> > "TypeError: 'NoneType' object is not callable" for every time it is
> > run...
>
>         Probably because it wants you to supply it with one or two
> /callable/ functions... but you are actually calling the functions and
> passing it the results of the called functions (and they aren't
> returning anything -- None).
>
>         Ignoring GUI stuff... here is a simple one-job threadpool algorithm
> -- you have to plug in the file list and the actual processing work. It
> creates n-threads; and those threads pull the work off of a common
> queue; the main program only has to fill the queue with the work to be
> done, and stuff a sentinal value onto the queue when it wants the
> threads to die -- which would be before shutdown of the program (create
> the pool at start-up, leave the threads blocked on the .get() until you
> need one to process...
>
> -=-=-=-=-=-=-=-
> #
> #       Example code for a pooled thread file processor
> #       NOT EXECUTABLE as is -- there is no code to obtain
> #       the list of files to be processed; and the processor
> #       just sleeps...
>
> import threading
> import Queue
> import time         #just for demo sleep
>
> NUMTHREADS = 5
> SENTINAL = object()
>
> workQueue = Queue.Queue()
>
> def fileProc():         #function that handles processing of the files
>     while True:
>         fname = workQueue.get()
>         if fname is SENTINAL:
>             workQueue.put(SENTINAL)    #recycle sentinal for next
>             break
>         print "Processing %s" % fname
>         time.sleep(3)   #replace with real file processing
>
> threadList = []
> for ti in range(NUMTHREADS):    #create worker threads
>     t = threading.Thread(target=fileProc)
>     t.start()
>     threadList.append(t)
>
> for fn in listOfFiles:  #queue up the file names to be worked
>     workQueue.put(fn)   #need to expand to include how names are
>                         #obtained
>
> workQueue.put(SENTINAL) #signal that no more files are to be worked
>
> for t in threadList:
>     t.join()            #wait for each thread to exit (ensures main
>                         #doesn't exit before all threads finish
> processing
>
> --
>         Wulfraed        Dennis Lee Bieber               KD6MOG
>         wlfr... at ix.netcom.com              wulfr... at bestiaria.com
>                 HTTP://wlfraed.home.netcom.com/
>         (Bestiaria Support Staff:               web-a... at bestiaria.com)
>                 HTTP://www.bestiaria.com/

Thanks for the information!  I can definitely see what you're talking
about, and the Exception is only "pass" right now while I am working
on the code.

However, it does process every file (it doesn't skip the second one),
and I'm guessing that this is because it loops so many times?  I guess
that means I am successful in spite of myself!  ;-)  (This wouldn't be
the first time...  ;-)  )

I REALLY appreciate your insights!!



More information about the Python-list mailing list