Threading with queues

Gib Bogle g.bogle at auckland.no.spam.ac.nz
Mon Dec 21 18:12:28 EST 2009


Hi,
I'm learning Python, jumping in the deep end with a threading application.  I 
came across an authoritative-looking site that recommends using queues for 
threading in Python.
http://www.ibm.com/developerworks/aix/library/au-threadingpython/index.html
The author provides example code that fetches data from several web sites, using 
threads.  I have modified his code slightly, just adding a couple of print 
statements and passing an ID number to the thread.

#!/usr/bin/env python
import Queue
import threading
import urllib2
import time

hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com", 
"http://ibm.com", "http://apple.com"]

queue = Queue.Queue()

class ThreadUrl(threading.Thread):
#"""Threaded Url Grab"""
   def __init__(self, queue,i):
     threading.Thread.__init__(self)
     self.queue = queue
     self.num = i
     print "Thread: ",self.num

   def run(self):
     while True:
       #grabs host from queue
       host = self.queue.get()
       print "num, host: ",self.num,host
       #grabs urls of hosts and prints first 1024 bytes of page
       url = urllib2.urlopen(host)
       print url.read(1024)

       #signals to queue job is done
       self.queue.task_done()

start = time.time()
def main():

   #spawn a pool of threads, and pass them queue instance
   for i in range(5):
     t = ThreadUrl(queue,i)
     t.setDaemon(True)
     t.start()

  #populate queue with data
     for host in hosts:
       queue.put(host)

  #wait on the queue until everything has been processed
     queue.join()

main()
print "Elapsed Time: %s" % (time.time() - start)

Executed on Windows with Python 2.5 this program doesn't do what you want, which 
is to fetch data from each site once.  Instead, it processes the first host in 
the list 5 times, the next 4 times, etc, and the last just once.  I don't know 
whether it is a case of the code simply being wrong (which seems unlikely), or 
the behaviour on my system being different from AIX (also seems unlikely).

Naively, I would have expected the queue to enforce processing of its members 
once only.  Is there a simple change that will make this code execute as 
required?  Or is this author out to lunch?

Cheers
Gib



More information about the Python-list mailing list