setting Referer for urllib.urlretrieve

samwyse samwyse at gmail.com
Mon Aug 10 11:21:49 EDT 2009


On Aug 9, 9:41 am, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> On Sun, 09 Aug 2009 06:13:38 -0700,samwysewrote:
> > Here's what I have so far:
>
> > import urllib
>
> > class AppURLopener(urllib.FancyURLopener):
> >     version = "App/1.7"
> >     referrer = None
> >     def __init__(self, *args):
> >         urllib.FancyURLopener.__init__(self, *args)
> >         if self.referrer:
> >             addheader('Referer', self.referrer)
>
> > urllib._urlopener = AppURLopener()
>
> > Unfortunately, the 'Referer' header potentially varies for each url that
> > I retrieve, and the way the module is written, I can't change the calls
> > to __init__ or open. The best idea I've had is to assign a new value to
> > my class variable just before calling urllib.urlretrieve(), but that
> > just seems ugly.  Any ideas?  Thanks.
>
> [Aside: an int variable is an int. A str variable is a str. A list
> variable is a list. A class variable is a class. You probably mean a
> class attribute, not a variable. If other languages want to call it a
> variable, or a sausage, that's their problem.]
>
> If you're prepared for a bit of extra work, you could take over all the
> URL handling instead of relying on automatic openers. This will give you
> much finer control, but it will also require more effort on your part.
> The basic idea is, instead of installing openers, and then ask the urllib
> module to handle the connection, you handle the connection yourself:
>
> make a Request object using urllib2.Request
> make an Opener object using urllib2.build_opener
> call opener.open(request) to connect to the server
> deal with the connection (retry, fail or read)
>
> Essentially, you use the Request object instead of a URL, and you would
> add the appropriate referer header to the Request object.
>
> Another approach, perhaps a more minimal change than the above, would be
> something like this:
>
> # untested
> class AppURLopener(urllib.FancyURLopener):
>     version = "App/1.7"
>     def __init__(self, *args):
>         urllib.FancyURLopener.__init__(self, *args)
>     def add_referrer(self, url=None):
>         if url:
>             addheader('Referer', url)
>
> urllib._urlopener = AppURLopener()
> urllib._urlopener.add_referrer("http://example.com/")

Thanks for the ideas.  I'd briefly considered something similar to
your first idea, implementing my own version of urlretrieve to accept
a Request object, but it does seem like a good bit of work.  Maybe
over Labor Day.  :)

The second idea is pretty much what I'm going to go with for now.  The
program that I'm writing is almost a clone of wget, but it fixes some
personal dislikes with the way recursive retrievals are done.  (Or
maybe I just don't understand wget's full array of options well
enough.)  This means that my referrer changes as I bounce up and down
the hierarchy, which makes this less convenient.  Still, it does seem
more convenient that re-writing the module from scratch.



More information about the Python-list mailing list