urllib so slow

Havok python.newsgroup at REMOVE.paulnilsson.com
Sat Feb 8 16:40:37 EST 2003


On Sat, 08 Feb 2003 22:28:38 +0100, an infinite amount of monkeys
hijacked the computer of Paul Nilsson
<python.newsgroup at REMOVE.paulnilsson.com> and wrote:

>On Sat, 8 Feb 2003 14:43:15 -0600, an infinite amount of monkeys
>hijacked the computer of Skip Montanaro <skip at pobox.com> and wrote:
>
>>
>>    Paul> Can someone tell me why urllib is so slow? The code below takes
>>    Paul> over 12 seconds to execute just for the google webpage!
>>
>>Perhaps something platform dependent or you just hit a slow combination of
>>google server and/or network congestion?  On my Mac OS X system conencted
>>via cable modem I get respectable results:
>
>I get the same result loading pages from my linux box over the 100MB
>connection so I don't think it cn be a networking problem. Perhaps it
>is a win98 specific problem.
>
>I eventually found this on sf which is now closed:
>
>http://sourceforge.net/tracker/index.php?func=detail&aid=508157&group_id=5470&atid=105470
>
>Nice to hear it works for you thou :)
>
>Cheers, Paul

There was a fix on df for python 2.0 by milosoftware(698929), I had to
make a slight modification for it to work with 2.2.2. I'll post the
solution here incase othes have the problem. c.l.p is a lot easier to
search than sourceforge!

(BTW this buffering made it 500 times faster for me!)



# The following code works around the "bufferless" operation of
# HTTPResponse. Its __init__ sets self.fp to sock.makefile('rb',0)
# which in fact sets the receive buffer to size 1. This cause so
# much CPU overhead, that network performance is slowed down to
# unacceptable levels.
# This hack can only be used if you are sure that the server will
# either close the connection after sending the file, or has a valid
# content-length header so that the response object will not attempt
# to read past EOF (which may cause deadlock).
class FastHTTPResponse(httplib.HTTPResponse):
    def __init__(self, sock, debuglevel=0, strict=0):
        httplib.HTTPResponse.__init__(self, sock, debuglevel, strict)
        self.fp = sock.makefile('rb', 8192)

# Tell the httplib that we want to use our hack.
httplib.HTTPConnection.response_class = FastHTTPResponse





More information about the Python-list mailing list