n00b with urllib2: How to make it handle cookie automatically?
est
electronixtar at gmail.com
Sun Feb 24 06:41:01 EST 2008
On Feb 23, 2:42 am, Rob Wolfe <r... at smsnet.pl> wrote:
> est <electronix... at gmail.com> writes:
> > Hi all,
>
> > I need urllib2 do perform series of HTTP requests with cookie from
> > PREVIOUS request(like our browsers usually do ). Many people suggest I
> > use some library(e.g. pycURL) instead but I guess it's good practise
> > for a python beginner to DIY something rather than use existing tools.
>
> > So my problem is how to expand the urllib2 class
>
> > from cookielib import CookieJar
> > class SmartRequest():
> > cj=CookieJar()
> > def __init__(self, strUrl, strContent=None):
> > self.Request = urllib2.Request(strUrl, strContent)
> > self.cj.add_cookie_header(self.Request)
> > self.Response = urllib2.urlopen(Request)
> > self.cj.extract_cookies(self.Response, self.Request)
> > def url
> > def read(self, intCount):
> > return self.Response.read(intCount)
> > def headers(self, strHeaderName):
> > return self.Response.headers[strHeaderName]
>
> > The code does not work because each time SmartRequest is initiated,
> > object 'cj' is cleared. How to avoid that?
> > The only stupid solution I figured out is use a global CookieJar
> > object. Is there anyway that could handle all this INSIDE the class?
>
> > I am totally new to OOP & python programming, so could anyone give me
> > some suggestions? Thanks in advance
>
> Google for urllib2.HTTPCookieProcessor.
>
> HTH,
> Rob- Hide quoted text -
>
> - Show quoted text -
Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
solved this problem by the following code.
class HTTPRefererProcessor(urllib2.BaseHandler):
"""Add Referer header to requests.
This only makes sense if you use each RefererProcessor for a
single
chain of requests only (so, for example, if you use a single
HTTPRefererProcessor to fetch a series of URLs extracted from a
single
page, this will break).
There's a proper implementation of this in module mechanize.
"""
def __init__(self):
self.referer = None
def http_request(self, request):
if ((self.referer is not None) and
not request.has_header("Referer")):
request.add_unredirected_header("Referer", self.referer)
return request
def http_response(self, request, response):
self.referer = response.geturl()
return response
https_request = http_request
https_response = http_response
def main():
cj = CookieJar()
opener = urllib2.build_opener(
urllib2.HTTPCookieProcessor(cj),
HTTPRefererProcessor(),
)
urllib2.install_opener(opener)
urllib2.urlopen(url1)
urllib2.urlopen(url2)
if "__main__" == __name__:
main()
And it's working great!
Once again, thanks everyone!
More information about the Python-list
mailing list