n00b with urllib2: How to make it handle cookie automatically?

est electronixtar at gmail.com
Sun Feb 24 06:41:01 EST 2008


On Feb 23, 2:42 am, Rob Wolfe <r... at smsnet.pl> wrote:
> est <electronix... at gmail.com> writes:
> > Hi all,
>
> > I need urllib2 do perform series of HTTP requests with cookie from
> > PREVIOUS request(like our browsers usually do ). Many people suggest I
> > use some library(e.g. pycURL) instead but I guess it's good practise
> > for a python beginner to DIY something rather than use existing tools.
>
> > So my problem is how to expand the urllib2 class
>
> > from cookielib import CookieJar
> > class SmartRequest():
> >     cj=CookieJar()
> >     def __init__(self, strUrl, strContent=None):
> >         self.Request    =   urllib2.Request(strUrl, strContent)
> >         self.cj.add_cookie_header(self.Request)
> >         self.Response   =   urllib2.urlopen(Request)
> >         self.cj.extract_cookies(self.Response, self.Request)
> >     def url
> >     def read(self, intCount):
> >         return self.Response.read(intCount)
> >     def headers(self, strHeaderName):
> >         return self.Response.headers[strHeaderName]
>
> > The code does not work because each time SmartRequest is initiated,
> > object 'cj' is cleared. How to avoid that?
> > The only stupid solution I figured out is use a global CookieJar
> > object. Is there anyway that could handle all this INSIDE the class?
>
> > I am totally new to OOP & python programming, so could anyone give me
> > some suggestions? Thanks in advance
>
> Google for urllib2.HTTPCookieProcessor.
>
> HTH,
> Rob- Hide quoted text -
>
> - Show quoted text -

Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
solved this problem by the following code.

class HTTPRefererProcessor(urllib2.BaseHandler):
    """Add Referer header to requests.

    This only makes sense if you use each RefererProcessor for a
single
    chain of requests only (so, for example, if you use a single
    HTTPRefererProcessor to fetch a series of URLs extracted from a
single
    page, this will break).

    There's a proper implementation of this in module mechanize.

    """
    def __init__(self):
        self.referer = None

    def http_request(self, request):
        if ((self.referer is not None) and
            not request.has_header("Referer")):
            request.add_unredirected_header("Referer", self.referer)
        return request

    def http_response(self, request, response):
        self.referer = response.geturl()
        return response

    https_request = http_request
    https_response = http_response

def main():
    cj = CookieJar()
    opener = urllib2.build_opener(
        urllib2.HTTPCookieProcessor(cj),
        HTTPRefererProcessor(),
    )
    urllib2.install_opener(opener)

    urllib2.urlopen(url1)
    urllib2.urlopen(url2)

if "__main__" == __name__:
    main()

And it's working great!

Once again, thanks everyone!



More information about the Python-list mailing list