Thread locking question.

Piet van Oostrum piet at cs.uu.nl
Sat May 9 15:43:57 EDT 2009


>>>>> grocery_stocker <cdalten at gmail.com> (gs) wrote:

[snip]

>gs> Maybe I'm being a bit daft, but what prevents the data from www.yahoo.com
>gs> from being mixed up with the data from www.google.com? Doesn't using
>gs> queue() prevent the data from being mixed up?

Nothing in your script prevents the data from getting mixed up. Now it
seems from some experimentation that the print statements might be atomic,
although I can't find anything about that in the Python doc, and I think
you shouldn't count on that. I would expect it not to be atomic when it
does a blocking I/O.

If I make your example more complete, printing the documents completely,
like:

    def run(self):
        while True:
            host = self.queue.get()
            if host is None:
                break
            url = urllib2.urlopen(host)
            while True:
                txt = url.read(1024)
                if not txt: break
                print txt,

then the document will get mixed up in the output. Likewise if you would
want to put them in a shared datastructure, you must use locking when you
insert them (for example you could put them in another Queue).

The Queue you use here only prevent the urls from getting mixed up, but
it has no effect on the further processing.

As I told my students two days ago: you shouldn't do thread programming
unless you have thoroughly studied the subject.

By the way there is another flaw in your program: you do the join only
on the last spawned thread. Because the threads are daemonic all other
threads that are still working will be killed prematurely when this
thread finishes.

The code should be like this:

def main():
    threads = []
    for i in range(5):
        t = MyUrl(queue)
        t.setDaemon(True)
        t.start()
        threads.append(t)
...
    for t in threads:
        t.join()

Or just don't make them daemonic.
-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: piet at vanoostrum.org



More information about the Python-list mailing list