urllib (and urllib2) read all data from page on open()?

Mon Mar 14 09:48:25 EST 2005

Whilst it might be able to do what I want I feel this to be a flaw in urllib
that should be fixed, or at least added to a buglist somewhere so I can at
least pretend someone other than me cares.

-----Original Message-----
From: Swaroop C H [mailto:g2swaroop at yahoo.com]
Sent: 14 March 2005 14:45
To: Alex Stapleton
Subject: RE: urllib (and urllib2) read all data from page on open()?

--- Alex Stapleton <alexs at advfn.com> wrote:
> Except wouldn't it of already read the entire file when it opened,
> or does it occour on the first read()? Also will the data returned
> from handle.read(100) be raw HTTP? In which case what if the
> encoding is chunked or gzipped?

Maybe the httplib module can help you.
>From http://docs.python.org/lib/httplib-examples.html :

    >>> import httplib
    >>> conn = httplib.HTTPConnection("www.python.org")
    >>> conn.request("GET", "/index.html")
    >>> r1 = conn.getresponse()
    >>> print r1.status, r1.reason
    200 OK
    >>> data1 = r1.read()
    >>> conn.request("GET", "/parrot.spam")
    >>> r2 = conn.getresponse()
    >>> print r2.status, r2.reason
    404 Not Found
    >>> data2 = r2.read()
    >>> conn.close()

As far as I can understand, you can read() data only when you want
to.

Caveat:
There's a warning that says "This module defines classes which
implement the client side of the HTTP and HTTPS protocols. It is
normally not used directly -- the module urllib uses it to handle
URLs that use HTTP and HTTPS."

HTH,

Swaroop C H
Blog: http://www.swaroopch.info
Book: http://www.byteofpython.info