A little threading problem

Jeremy Jones zanesdad at bellsouth.net
Thu Dec 2 07:58:17 EST 2004


Alban Hertroys wrote:

> Jeremy Jones wrote:
>
>> (not waiting, because it already did happen).  What is it exactly 
>> that you are trying to accomplish?  I'm sure there is a better approach.
>
>
> I think I saw at least a bit of the light, reading up on readers and 
> writers (A colleague showed up with a book called "Operating system 
> concepts" that has a chapter on process synchronization).
> It looks like I should be writing and reading 3 Queues instead of 
> trying to halt and pause the threads explicitly. That looks a lot 
> easier...
>
> Thanks for pointing out the problem area.

That's actually along the lines of what I was going to recommend after 
getting more detail on what you are doing.  A couple of things that may 
(or may not) help you are:

* the Queue class in the Python standard library has a "maxsize" 
parameter.  When you create a queue, you can specify how large you want 
it to grow.  You can have your three threads busily parsing XML and 
extracting data from it and putting it into a queue and when there are a 
total of "maxsize" items in the queue, the next put() call (to put data 
into the queue) will block until the consumer thread has reduced the 
number of items in the queue.  I've never used 
xml.parsers.xmlproc.xmlproc.Application, but looking at the data, it 
seems to resemble a SAX parser, so you should have no problem putting 
(potentially blocking) calls to the queue into your handler.  The only 
thing this really buys you won't have read the whole XML file into memory.
* the get method on a queue object has a "block" flag.  You can 
effectively poll your queues something like this:

#untested code
#a_done, b_done and c_done are just checks to see if that particular 
document is done
while not (a_done and b_done and c_done):
    got_a, got_b, got_c = False, False, False
    item_a, item_b, item_c = None, None, None
    while (not a_done) and (not got_a):
       try:
          item_a = queue_a.get(0) #the 0 says don't block and raise an 
Empty exception if there's nothing there
          got_a = True
       except Queue.Empty:
          time.sleep(.3)
    while (not b_done) and (not got_b):
       try:
          item_b = queue_b.get(0)
          got_a = True
       except Queue.Empty:
          time.sleep(.3)
    while (not c_done) and (not got_c):
       try:
          item_c = queue_c.get(0)
          got_c = True
       except Queue.Empty:
          time.sleep(.3)
    put_into_database_or_whatever(item_a, item_b, item_c)

This will allow you to deal with one item at a time and if the xml files 
are different sizes, it should still work - you'll just pass None to 
put_into_database_or_whaver for that particular file.

HTH.

Jeremy Jones



More information about the Python-list mailing list