n00b with urllib2: How to make it handle cookie automatically?

7stud bbxx789_05ss at yahoo.com
Sun Feb 24 16:46:22 EST 2008


On Feb 24, 4:41 am, est <electronix... at gmail.com> wrote:
> On Feb 23, 2:42 am, Rob Wolfe <r... at smsnet.pl> wrote:
>
>
>
> > est <electronix... at gmail.com> writes:
> > > Hi all,
>
> > > I need urllib2 do perform series of HTTP requests with cookie from
> > > PREVIOUS request(like our browsers usually do ). Many people suggest I
> > > use some library(e.g. pycURL) instead but I guess it's good practise
> > > for a python beginner to DIY something rather than use existing tools.
>
> > > So my problem is how to expand the urllib2 class
>
> > > from cookielib import CookieJar
> > > class SmartRequest():
> > >     cj=CookieJar()
> > >     def __init__(self, strUrl, strContent=None):
> > >         self.Request    =   urllib2.Request(strUrl, strContent)
> > >         self.cj.add_cookie_header(self.Request)
> > >         self.Response   =   urllib2.urlopen(Request)
> > >         self.cj.extract_cookies(self.Response, self.Request)
> > >     def url
> > >     def read(self, intCount):
> > >         return self.Response.read(intCount)
> > >     def headers(self, strHeaderName):
> > >         return self.Response.headers[strHeaderName]
>
> > > The code does not work because each time SmartRequest is initiated,
> > > object 'cj' is cleared. How to avoid that?
> > > The only stupid solution I figured out is use a global CookieJar
> > > object. Is there anyway that could handle all this INSIDE the class?
>
> > > I am totally new to OOP & python programming, so could anyone give me
> > > some suggestions? Thanks in advance
>
> > Google for urllib2.HTTPCookieProcessor.
>
> > HTH,
> > Rob- Hide quoted text -
>
> > - Show quoted text -
>
> Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
> solved this problem by the following code.
>
> class HTTPRefererProcessor(urllib2.BaseHandler):
>     """Add Referer header to requests.
>
>     This only makes sense if you use each RefererProcessor for a
> single
>     chain of requests only (so, for example, if you use a single
>     HTTPRefererProcessor to fetch a series of URLs extracted from a
> single
>     page, this will break).
>
>     There's a proper implementation of this in module mechanize.
>
>     """
>     def __init__(self):
>         self.referer = None
>
>     def http_request(self, request):
>         if ((self.referer is not None) and
>             not request.has_header("Referer")):
>             request.add_unredirected_header("Referer", self.referer)
>         return request
>
>     def http_response(self, request, response):
>         self.referer = response.geturl()
>         return response
>
>     https_request = http_request
>     https_response = http_response
>
> def main():
>     cj = CookieJar()
>     opener = urllib2.build_opener(
>         urllib2.HTTPCookieProcessor(cj),
>         HTTPRefererProcessor(),
>     )
>     urllib2.install_opener(opener)
>
>     urllib2.urlopen(url1)
>     urllib2.urlopen(url2)
>
> if "__main__" == __name__:
>     main()
>
> And it's working great!
>
> Once again, thanks everyone!

How does the class HTTPReferrerProcessor do anything useful for you?



More information about the Python-list mailing list