n00b with urllib2: How to make it handle cookie automatically?

est electronixtar at gmail.com
Sun Feb 24 23:23:20 EST 2008


On Feb 25, 5:46 am, 7stud <bbxx789_0... at yahoo.com> wrote:
> On Feb 24, 4:41 am, est <electronix... at gmail.com> wrote:
>
>
>
>
>
> > On Feb 23, 2:42 am, Rob Wolfe <r... at smsnet.pl> wrote:
>
> > > est <electronix... at gmail.com> writes:
> > > > Hi all,
>
> > > > I need urllib2 do perform series of HTTP requests with cookie from
> > > > PREVIOUS request(like our browsers usually do ). Many people suggest I
> > > > use some library(e.g. pycURL) instead but I guess it's good practise
> > > > for a python beginner to DIY something rather than use existing tools.
>
> > > > So my problem is how to expand the urllib2 class
>
> > > > from cookielib import CookieJar
> > > > class SmartRequest():
> > > >     cj=CookieJar()
> > > >     def __init__(self, strUrl, strContent=None):
> > > >         self.Request    =   urllib2.Request(strUrl, strContent)
> > > >         self.cj.add_cookie_header(self.Request)
> > > >         self.Response   =   urllib2.urlopen(Request)
> > > >         self.cj.extract_cookies(self.Response, self.Request)
> > > >     def url
> > > >     def read(self, intCount):
> > > >         return self.Response.read(intCount)
> > > >     def headers(self, strHeaderName):
> > > >         return self.Response.headers[strHeaderName]
>
> > > > The code does not work because each time SmartRequest is initiated,
> > > > object 'cj' is cleared. How to avoid that?
> > > > The only stupid solution I figured out is use a global CookieJar
> > > > object. Is there anyway that could handle all this INSIDE the class?
>
> > > > I am totally new to OOP & python programming, so could anyone give me
> > > > some suggestions? Thanks in advance
>
> > > Google for urllib2.HTTPCookieProcessor.
>
> > > HTH,
> > > Rob- Hide quoted text -
>
> > > - Show quoted text -
>
> > Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
> > solved this problem by the following code.
>
> > class HTTPRefererProcessor(urllib2.BaseHandler):
> >     """Add Referer header to requests.
>
> >     This only makes sense if you use each RefererProcessor for a
> > single
> >     chain of requests only (so, for example, if you use a single
> >     HTTPRefererProcessor to fetch a series of URLs extracted from a
> > single
> >     page, this will break).
>
> >     There's a proper implementation of this in module mechanize.
>
> >     """
> >     def __init__(self):
> >         self.referer = None
>
> >     def http_request(self, request):
> >         if ((self.referer is not None) and
> >             not request.has_header("Referer")):
> >             request.add_unredirected_header("Referer", self.referer)
> >         return request
>
> >     def http_response(self, request, response):
> >         self.referer = response.geturl()
> >         return response
>
> >     https_request = http_request
> >     https_response = http_response
>
> > def main():
> >     cj = CookieJar()
> >     opener = urllib2.build_opener(
> >         urllib2.HTTPCookieProcessor(cj),
> >         HTTPRefererProcessor(),
> >     )
> >     urllib2.install_opener(opener)
>
> >     urllib2.urlopen(url1)
> >     urllib2.urlopen(url2)
>
> > if "__main__" == __name__:
> >     main()
>
> > And it's working great!
>
> > Once again, thanks everyone!
>
> How does the class HTTPReferrerProcessor do anything useful for you?- Hide quoted text -
>
> - Show quoted text -

Well, it's more browser-like. Many be I should have snipped
HTTPReferrerProcessor code for this discussion.



More information about the Python-list mailing list