setting Referer for urllib.urlretrieve

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sun Aug 9 10:41:05 EDT 2009


On Sun, 09 Aug 2009 06:13:38 -0700, samwyse wrote:

> Here's what I have so far:
> 
> import urllib
> 
> class AppURLopener(urllib.FancyURLopener):
>     version = "App/1.7"
>     referrer = None
>     def __init__(self, *args):
>         urllib.FancyURLopener.__init__(self, *args)
>         if self.referrer:
>             addheader('Referer', self.referrer)
> 
> urllib._urlopener = AppURLopener()
> 
> Unfortunately, the 'Referer' header potentially varies for each url that
> I retrieve, and the way the module is written, I can't change the calls
> to __init__ or open. The best idea I've had is to assign a new value to
> my class variable just before calling urllib.urlretrieve(), but that
> just seems ugly.  Any ideas?  Thanks.

[Aside: an int variable is an int. A str variable is a str. A list 
variable is a list. A class variable is a class. You probably mean a 
class attribute, not a variable. If other languages want to call it a 
variable, or a sausage, that's their problem.]

If you're prepared for a bit of extra work, you could take over all the 
URL handling instead of relying on automatic openers. This will give you 
much finer control, but it will also require more effort on your part. 
The basic idea is, instead of installing openers, and then ask the urllib 
module to handle the connection, you handle the connection yourself:

make a Request object using urllib2.Request
make an Opener object using urllib2.build_opener
call opener.open(request) to connect to the server
deal with the connection (retry, fail or read)

Essentially, you use the Request object instead of a URL, and you would 
add the appropriate referer header to the Request object.

Another approach, perhaps a more minimal change than the above, would be 
something like this:

# untested
class AppURLopener(urllib.FancyURLopener):
    version = "App/1.7"
    def __init__(self, *args):
        urllib.FancyURLopener.__init__(self, *args)
    def add_referrer(self, url=None):
        if url:
            addheader('Referer', url)

urllib._urlopener = AppURLopener()
urllib._urlopener.add_referrer("http://example.com/")



-- 
Steven



More information about the Python-list mailing list