Code For Five Threads To Process Multiple Files?

tdahsu at gmail.com tdahsu at gmail.com
Thu May 22 14:03:48 EDT 2008


On May 21, 11:41 am, tda... at gmail.com wrote:
> On May 21, 11:13 am, "A.T.Hofkamp" <h... at se-162.se.wtb.tue.nl> wrote:
>
>
>
> > On 2008-05-21, tda... at gmail.com <tda... at gmail.com> wrote:
>
> > > I'd appreciate any help.  I've got a list of files in a directory, and
> > > I'd like to iterate through that list and process each one.  Rather
> > > than do that serially, I was thinking I should start five threads and
> > > process five files at a time.
>
> > > Is this a good idea?  I picked the number five at random... I was
>
> > Depends what you are doing.
> > If you are mainly reading/writing files, there is not much to gain, since 1
> > process will already push the disk IO system to its limit. If you do a lot of
> > processing, then more threads than the number of processors is not much use. If
> > you have more 'burtsy' behavior (first do lot of reading, then lot of
> > processing, then again reading, etc), then the system may be able to do some
> > scheduling and keep both the processors and the file system busy.
>
> > I cannot really give you advice on threading, I have never done that. You may
> > want to consider an alternative, namely multi-tasking at OS level. If you can
> > easily split the files over a number of OS processes (written in Python), you
> > can make the Python program really simple, and let the OS handle the
> > task-switching between the programs.
>
> > Sincerely,
> > Albert
>
> Albert,
>
> Thanks for your response - I appreciate your time!
>
> I am mainly reading and writing files, so it seems like it might not
> be a good idea.  What if I read the whole file into memory first, and
> operate on it there?  They are not large files...
>
> Either way, I'd hope that someone might respond with an example, as
> then I could test and see which is faster!
>
> Thanks again.

Ah, well, I didn't get any other responses, but here's what I've done:

loopCount = 0
                for l in range(len(self.filesToProcess)):
                    threads = []
                    try:
 
threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount
+l])))
 
threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount
+2])))
 
threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount
+3])))
 
threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount
+4])))
 
threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount
+5])))
                        msg = "Processing file...\n"
                        for thread in threads:
                            wx.CallAfter(self.textctrl03.write(msg),
thread.start())
                        for thread in threads:
                            thread.join()
                        loopCount += 5
                    except IndexError:
                        pass

It works, and it works well.  It starts five threads, and processes
five files at a time.  (In the "self.processFiles" I read the whole
file into memory using readlines(), which works well.)

Of course, now the wx.CallAfter function doesn't work... I get
"TypeError: 'NoneType' object is not callable" for every time it is
run...



More information about the Python-list mailing list