urllib to cache 301 redirections?

John Nagle nagle at animats.com
Mon Jul 16 15:34:00 EDT 2007


O.R.Senthil Kumaran wrote:
> Thank you for the reply, Mr. John and I apologize for a very late response
> from my end.
> 
> * John J. Lee <jjl at pobox.com> [2007-07-06 18:53:09]:
> 
> 
>>"O.R.Senthil Kumaran" <orsenthil at users.sourceforge.net> writes:
>>
>>
>>>Hi,
>>>There is an Open Tracker item against urllib2 library python.org/sf/735515
>>
>>>I am not completely getting what "cache - redirection" implies and what should
>>>be done with the urllib2 module. Any pointers?
>>
>>When a 301 redirect occurs after a request for URL U, via
>>urllib2.urlopen(U), urllib2 should remember the result of that
>>redirection, viz a second URL, V.  Then, when another
>>urllib2.urlopen(U) takes place, urllib2 should send an HTTP request
>>for V, not U.  urllib2 does not currently do this.  (Obviously the
>>cache -- that is, the dictionary or whatever that stores the mapping
>>from URLs U to V -- should not be maintained by function urlopen
>>itself.  Perhaps it should live on the redirect handler.)
>>
> 
> 
> I spent a little time thinking about a solution and figured out that the
> following changes to HTTPRedirectHandler, might be helpful in implementing
> this.
> 
> Class HTTPRedirectHandler(BaseHandler):
>     # ... omitted ...
>     # Initialize a dictionary to hold cache.
> 
>     def __init__(self):
>         self.cache = {}
> 
> 
>     # Handles 301 errors separately in a different function which maintains a
>     # maintains cache.
> 
>     def http_error_301(self, req, fp, code, msg, headers):
> 
>         if req in self.cache:
>             # Look for loop, if a particular url appears in both key and value
>             # then there is loop and return HTTPError
>             if len(set(self.cache.keys()) & set(self.cache.values())) > 0:
>                 raise HTTPError(req.get_full_url(), code, self.inf_msg + msg +
>                         headers, fp)
>             return self.cache[req]
> 
>         self.cache[req] = self.http_error_302(req,fp,code,msg, headers)
>         return self.cache[req]
> 
> 
> John, let me know your comments on this approach.
> I have not tested this code in real scenario yet with a 301 redirect.
> If its okay, I shall test it and submit a patch for the tracker item.

    That assumes you're reusing the same object to reopen another URL.

    Is this thread-safe?

    That's also an inefficient way to test for an empty dictionary.

					John Nagle



More information about the Python-list mailing list