[Tutor] Threads

Tue Nov 16 21:56:28 CET 2004

urllib is blocking, so you can't really use it wiht non blocking code.  
the urlopen functio could take awhile, and then even if data is on the 
socket then it will still block for te read most likely which is not 
going to help you. One is going to have to use a non blocking url api in 
order to make the most of their time.

Danny Yoo wrote:

>On Mon, 15 Nov 2004, Terry Carroll wrote:
>
>  
>
>>On Mon, 15 Nov 2004, orbitz wrote:
>>
>>    
>>
>>>I guess you didn't read what I said. I suggested using
>>>async/non-blocking sockets so you may have multiple downloads going.
>>>Feel free to google what they are.
>>>      
>>>
>>I read it; I misunderstood it.
>>    
>>
>
>
>Hi everyone,
>
>
>[text cut]
>
>I have to admit: I haven't done too much asynchronous stuff myself yet.
>A learning experience!  *grin*
>
>
>Let's look at an example of doing something like asynchronous http
>downloads, using the socket.select() call.  It sounds like the problem is
>to try retrieving a bunch of pages by url simulaneously.  We try doing
>things in parallel to improve network throughput, and to account for
>certain web pages coming off slower than others.
>
>
>We can use urllib.urlopen() to grab a bunch of web pages.  Since that
>object looks just like a file, we can using it as part of a 'select' event
>loop.  For example:
>
>###
>  
>
>>>>import urllib
>>>>import select
>>>>f = urllib.urlopen("http://python.org")
>>>>ready_in, ready_out, ready_exc = select.select([f], [], [])
>>>>        
>>>>
>###
>
>When we use a select.select(), what comes back are the file objects that
>are ready to be read().  select.select() is useful because it returns us
>all the files that have some content to read.
>
>
>
>
>Here's some demonstration code of using select together with the file-like
>objects that come off of urlopen():
>
>
>######
>"""A small demonstration on grabbing pages asynchronously.
>
>Danny Yoo (dyoo at hkn.eecs.berkeley.edu)
>
>urllib.urlopen() provides a file-like object, but we can still get at
>the underlying socket.
>
>"""
>
>import select
>import sys
>import urllib
>
>
>class PageGrabber:
>    def __init__(self):
>        self._urls = {}
>        self._inFiles = {}
>        self._outFiles = {}
>
>
>    def add(self, url, outputFile):
>        """Adds a new url to be grabbed.  We start writing the output
>        to the outputFile."""
>        openedFile = urllib.urlopen(url)
>        fileno = openedFile.fileno()
>        self._inFiles[fileno] = openedFile
>        self._urls[fileno] = url
>        self._outFiles[fileno] = outputFile
>
>
>    def writeOutAllPages(self):
>        """Waits until all the url streams are written out to their
>        respective outFiles."""
>        while self._urls:
>            ins, outs, errs = select.select(self._inFiles.keys(), [], [])
>            for in_fileno in ins:
>                all_done = self._writeBlock(in_fileno)
>                if all_done:
>                    self._dropUrl(in_fileno)
>
>
>    def _dropUrl(self, in_fileno):
>        del self._urls[in_fileno]
>        self._inFiles[in_fileno].close()
>        del self._inFiles[in_fileno]
>        del self._outFiles[in_fileno]
>
>
>
>    def _writeBlock(self, in_fileno, block_size=1024):
>        """Write out the next block.  If no more blocks are available,
>        returns True.  Else, returns false."""
>        next_block = self._inFiles[in_fileno].read(block_size)
>        self._outFiles[in_fileno].write(next_block)
>        if next_block:
>            return False
>        else:
>            return True
>
>
>
>######################################################################
>## The rest here is just test code.  I really should be using unittest
>## but I got impatient.  *grin*
>
>class TracedFile:
>    """A small file just to trace when things are getting written.
>    Used just for debugging purposes"""
>    def __init__(self, name, file):
>        self.name = name
>        self.file = file
>
>    def write(self, bytes):
>        sys.stderr.write("%s is writing.\n" % self.name)
>        self.file.write(bytes)
>
>
>if __name__ == '__main__':
>    p = PageGrabber()
>    p.add("http://python.org",
>          TracedFile("python.org", sys.stdout))
>    p.add("http://www.pythonware.com/daily/",
>          TracedFile("daily python url", sys.stdout))
>    p.writeOutAllPages()
>######
>
>
>The code is really rough and not factored well yet, but it's a starting
>point.  I hope this helps!
>
>_______________________________________________
>Tutor maillist  -  Tutor at python.org
>http://mail.python.org/mailman/listinfo/tutor
>
>  
>