Threaded Design Question

Fri Aug 10 19:25:02 EDT 2007

On Aug 9, 9:45 pm, "Mark T" <nos... at nospam.com> wrote:
> <half.ital... at gmail.com> wrote in message
>
> news:1186683909.797328.68770 at i13g2000prf.googlegroups.com...
>
>
>
> > Hi all!  I'm implementing one of my first multithreaded apps, and have
> > gotten to a point where I think I'm going off track from a standard
> > idiom.  Wondering if anyone can point me in the right direction.
>
> > The script will run as a daemon and watch a given directory for new
> > files.  Once it determines that a file has finished moving into the
> > watch folder, it will kick off a process on one of the files.  Several
> > of these could be running at any given time up to a max number of
> > threads.
>
> > Here's how I have it designed so far.  The main thread starts a
> > Watch(threading.Thread) class that loops and searches a directory for
> > files.  It has been passed a Queue.Queue() object (watch_queue), and
> > as it finds new files in the watch folder, it adds the file name to
> > the queue.
>
> > The main thread then grabs an item off the watch_queue, and kicks off
> > processing on that file using another class Worker(threading.thread).
>
> > My problem is with communicating between the threads as to which files
> > are currently processing, or are already present in the watch_queue so
> > that the Watch thread does not continuously add unneeded files to the
> > watch_queue to be processed.  For example...Watch() finds a file to be
> > processed and adds it to the queue.  The main thread sees the file on
> > the queue and pops it off and begins processing.  Now the file has
> > been removed from the watch_queue, and Watch() thread has no way of
> > knowing that the other Worker() thread is processing it, and shouldn't
> > pick it up again.  So it will see the file as new and add it to the
> > queue again.  PS.. The file is deleted from the watch folder after it
> > has finished processing, so that's how i'll know which files to
> > process in the long term.
>
> > I made definite progress by creating two queues...watch_queue and
> > processing_queue, and then used lists within the classes to store the
> > state of which files are processing/watched.
>
> > I think I could pull it off, but it has got very confusing quickly,
> > trying to keep each thread's list and the queue always in sync with
> > one another.  The easiset solution I can see is if my threads could
> > read an item from the queue without removing it from the queue and
> > only remove it when I tell it to.  Then the Watch() thread could then
> > just follow what items are on the watch_queue to know what files to
> > add, and then the Worker() thread could intentionally remove the item
> > from the watch_queue once it has finished processing it.
>
> > Now that I'm writing this out, I see a solution by over-riding or
> > wrapping Queue.Queue().get() to give me the behavior I mention above.
>
> > I've noticed .join() and .task_done(), but I'm not sure of how to use
> > them properly.  Any suggestions would be greatly appreciated.
>
> > ~Sean
>
> Just rename the file.  We've used that technique in a similar application at
> my work for years where a service looks for files of a particular extension
> to appear in a directory.  When the service sees a file, in renames it to a
> different extension and spins off a thread to process the contents.
>
> -Mark T.

I ended up taking this route for the most part.  The worker thread
first moves the file to be processed into a temp directory, and the
watch thread never knows about it again.  I still had to implement my
StateQueue(Queue.Queue) so I could implement a function to return all
the items on the queue without popping them off.

Thanks all for your great ideas.  My current response to multi-
threading... PITA!

~Sean