cookielib

John J. Lee jjl at pobox.com
Sat Aug 11 07:25:01 EDT 2007


Boris Ozegovic <ninja.krmenadl at nes.com> writes:

> Hi
>
> I have HTTP client which accepts cookies.  If client allready has cookie,
> but that cookie has expired, server sends him new cookie, and in response
> object Set-Cookie: header everything is fine, but if I reload request,
> client sends expired cookie, and not the new one.  In cookiejar there is
> only new and valid cookie, and if I use regular browser everything is fine.
> The code is following:
>
> urlopen = urllib2.urlopen
> Request = urllib2.Request
> cj = cookielib.LWPCookieJar()
> COOKIEFILE = 'cookies.lwp'
> 	
> if os.path.isfile(COOKIEFILE):
> # if we have a cookie file already saved
> # then load the cookies into the Cookie Jar
>     cj.load(COOKIEFILE)
>
> opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
> urllib2.install_opener(opener)
> url = "http://localhost:8000/files/index.html"
> params = {'question':question}
> data = urllib.urlencode(params)
> Request(url, data)
> try:
>     response = urlopen(Request)
>     etc.
>     
> Only if I create new request object the new cookie is send, but I don't
> want to create new object.  And idea?    

You seem to suggest in your first paragraph that you call urlopen more
than once, but in the code you show, you call it only once.

This is the way HTTP cookies work: you can't send back a cookie before
you've received it.  So, you need two HTTP requests: one to get the
cookie, and a second to send it back again.

However, some websites arrange things so that a single attempt to load
a URL results in multiple HTTP requests.  One common way to do that is
by using an "HTTP Refresh" -- look at the top of the HTML to see if
there's a META tag something like this:

<meta http-equiv="refresh" content="1" />

This means that the browser should reload the page after a pause of
one second.  Often, the reloaded page then does not contain another
meta refresh tag, so that the page does not reload every second
forever (the server knows not to include the tag because it received
the cookie).

This tag can be supported by appropriate Python code.  So, two HTTP
requests needn't necessarily mean two urlopen calls.  This library
handles refresh redirections, and supports essentially the same
interface as urllib2 and cookielib (the union of the two interfaces):

http://wwwsearch.sourceforge.net/mechanize/


John



More information about the Python-list mailing list