urllib to cache 301 redirections?

O.R.Senthil Kumaran orsenthil at users.sourceforge.net
Mon Jul 16 15:03:19 EDT 2007


Thank you for the reply, Mr. John and I apologize for a very late response
from my end.

* John J. Lee <jjl at pobox.com> [2007-07-06 18:53:09]:

> "O.R.Senthil Kumaran" <orsenthil at users.sourceforge.net> writes:
> 
> > Hi,
> > There is an Open Tracker item against urllib2 library python.org/sf/735515
> 
> > I am not completely getting what "cache - redirection" implies and what should
> > be done with the urllib2 module. Any pointers?
> 
> When a 301 redirect occurs after a request for URL U, via
> urllib2.urlopen(U), urllib2 should remember the result of that
> redirection, viz a second URL, V.  Then, when another
> urllib2.urlopen(U) takes place, urllib2 should send an HTTP request
> for V, not U.  urllib2 does not currently do this.  (Obviously the
> cache -- that is, the dictionary or whatever that stores the mapping
> from URLs U to V -- should not be maintained by function urlopen
> itself.  Perhaps it should live on the redirect handler.)
> 

I spent a little time thinking about a solution and figured out that the
following changes to HTTPRedirectHandler, might be helpful in implementing
this.

Class HTTPRedirectHandler(BaseHandler):
    # ... omitted ...
    # Initialize a dictionary to hold cache.

    def __init__(self):
        self.cache = {}


    # Handles 301 errors separately in a different function which maintains a
    # maintains cache.

    def http_error_301(self, req, fp, code, msg, headers):

        if req in self.cache:
            # Look for loop, if a particular url appears in both key and value
            # then there is loop and return HTTPError
            if len(set(self.cache.keys()) & set(self.cache.values())) > 0:
                raise HTTPError(req.get_full_url(), code, self.inf_msg + msg +
                        headers, fp)
            return self.cache[req]

        self.cache[req] = self.http_error_302(req,fp,code,msg, headers)
        return self.cache[req]


John, let me know your comments on this approach.
I have not tested this code in real scenario yet with a 301 redirect.
If its okay, I shall test it and submit a patch for the tracker item.

Thanks,
Senthil



-- 
O.R.Senthil Kumaran
http://uthcode.sarovar.org



More information about the Python-list mailing list