Thread locking question.

Pascal Chambon chambon.pascal at wanadoo.fr
Sat May 9 12:32:41 EDT 2009


grocery_stocker a écrit :
> On May 9, 8:36 am, Piet van Oostrum <p... at cs.uu.nl> wrote:
>   
>>>>>>> grocery_stocker <cdal... at gmail.com> (gs) wrote:
>>>>>>>               
>>> gs> The following code gets data from 5 different websites at the "same
>>> gs> time".
>>> gs> #!/usr/bin/python
>>> gs> import Queue
>>> gs> import threading
>>> gs> import urllib2
>>> gs> import time
>>> gs> hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
>>> gs>          "http://ibm.com", "http://apple.com"]
>>> gs> queue = Queue.Queue()
>>> gs> class MyUrl(threading.Thread):
>>> gs>     def __init__(self, queue):
>>> gs>         threading.Thread.__init__(self)
>>> gs>         self.queue = queue
>>> gs>     def run(self):
>>> gs>         while True:
>>> gs>             host = self.queue.get()
>>> gs>             if host is None:
>>> gs>                 break
>>> gs>             url = urllib2.urlopen(host)
>>> gs>             print url.read(1024)
>>> gs>             #self.queue.task_done()
>>> gs> start = time.time()
>>> gs> def main():
>>> gs>     for i in range(5):
>>> gs>         t = MyUrl(queue)
>>> gs>         t.setDaemon(True)
>>> gs>         t.start()
>>> gs>     for host in hosts:
>>> gs>         print "pushing", host
>>> gs>         queue.put(host)
>>> gs>     for i in range(5):
>>> gs>         queue.put(None)
>>> gs>     t.join()
>>> gs> if __name__ == "__main__":
>>> gs>     main()
>>> gs>     print "Elapsed Time: %s" % (time.time() - start)
>>> gs> How does the parallel download work if each thread has a lock? When
>>> gs> the program openswww.yahoo.com, it places a lock on the thread,
>>> gs> right? If so, then doesn't that mean the other 4 sites have to wait
>>> gs> for the thread to release the lock?
>>>       
>> No. Where does it set a lock? There is only a short lock period in the queue
>> when an item is put in the queue or got from the queue. And of course we
>> have the GIL, but this is released as soon as a long during operation is
>> started - in this case when the Internet communication is done.
>> --
>>     
>
> Maybe I'm being a bit daft, but what prevents the data from www.yahoo.com
> from being mixed up with the data from www.google.com? Doesn't using
> queue() prevent the data from being mixed up?
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
>   
Hello

Each thread has a separate access to internet (its own TCP/IP 
connection, port number etc.), so the incoming data will never get mixed 
between the thread on the input.

The only problem is when you explicitly use shared data structures be 
the threads - like the queue here that they all access.
But the queue is protected against multithreading so there is no problem 
there (another data structure might give bugs, if not explicitly locked 
before use).

On the contarry, there will be mixes on the console (stdout), since each 
thread can write to it at any moment. It's likely that the sources of 
all the pages will get mixed, on your screen, yep. ^^

Regards,
pascal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090509/f420c4c8/attachment-0001.html>


More information about the Python-list mailing list