retrieving https pages

ncf nothingcanfulfill at gmail.com
Tue Jul 19 02:40:51 EDT 2005


It might be checking the browser's User-agent. My best bet for you
would to be to use something to record the headers your browser sends
out, and mimic those in Python.

If you look at the source code for urlopener (I think you can press
Alt+M and type in "urlopener"), under the FancyURLopener definition,
you should see something like self.add_headers (not on a box to check
it right now, but it's in the constructer, I remember that much).

Just set all the headers to send out (like your browser would) by
setting that value from your script. i.e.:

import urlopener
urlopener = FancyURLopener()
urlopener.add_headers =
[('User-agent','blah'),('Header2','val'),('monkey','bone')]
# do the other stuff here :P

HTH

-Wes




More information about the Python-list mailing list