urllib (and urllib2) read all data from page on open()?

Bengt Richter bokr at oz.net
Mon Mar 14 14:30:47 EST 2005


On Mon, 14 Mar 2005 14:48:25 -0000, "Alex Stapleton" <alexs at advfn.com> wrote:

>Whilst it might be able to do what I want I feel this to be a flaw in urllib
>that should be fixed, or at least added to a buglist somewhere so I can at
>least pretend someone other than me cares.
>
Someone cares about top-posting. Please don't ;-)

>-----Original Message-----
>From: Swaroop C H [mailto:g2swaroop at yahoo.com]
>Sent: 14 March 2005 14:45
>To: Alex Stapleton
>Subject: RE: urllib (and urllib2) read all data from page on open()?
>
>
>--- Alex Stapleton <alexs at advfn.com> wrote:
>> Except wouldn't it of already read the entire file when it opened,
>> or does it occour on the first read()? Also will the data returned
>> from handle.read(100) be raw HTTP? In which case what if the
>> encoding is chunked or gzipped?
>
>Maybe the httplib module can help you.
>>From http://docs.python.org/lib/httplib-examples.html :
>
>    >>> import httplib
>    >>> conn = httplib.HTTPConnection("www.python.org")
>    >>> conn.request("GET", "/index.html")
>    >>> r1 = conn.getresponse()
>    >>> print r1.status, r1.reason
>    200 OK
>    >>> data1 = r1.read()
>    >>> conn.request("GET", "/parrot.spam")
>    >>> r2 = conn.getresponse()
>    >>> print r2.status, r2.reason
>    404 Not Found
>    >>> data2 = r2.read()
>    >>> conn.close()
>
>As far as I can understand, you can read() data only when you want
>to.
>
>Caveat:
>There's a warning that says "This module defines classes which
>implement the client side of the HTTP and HTTPS protocols. It is
>normally not used directly -- the module urllib uses it to handle
>URLs that use HTTP and HTTPS."
>
>HTH,
>
>Swaroop C H
>Blog: http://www.swaroopch.info
>Book: http://www.byteofpython.info
>

Regards,
Bengt Richter



More information about the Python-list mailing list