waiting for html to load: a followup

Josh joshl at commenspace.org
Thu Aug 26 12:53:44 EDT 2004


Hi - A couple days ago I posted asking for help on how to download a 
pushed file.  I am trying to write a script to download a bunch of links 
from a page that takes a while to load.

I managed to get just about everything done using python to load IE, but
aside from not really liking that style, I couldnt figure out how to 
have python download the pushed file, or how to read IE headers into 
python (the headers point to the download location)

Anyway, I decided to forget IE and I am now trying to use urllib2 to 
open up the page, read it, etc.  My problem is the page has a built-in 
refresh and I don't know how to have python re-read the page until it's 
ready to hand over the links.

An example of the page is:
http://edcw2ks23.cr.usgs.gov/Website/zipship/waiting.jsp?areaList=49.0,47.0,-122.0,-124.08&prodList=NED,

I believe I need to read the header, grab the cookie session id, and add 
it back to the header.  I can do all thus, but I'm stuck on probably 
very simple syntax to re-read the page rather than open a new 
connection, if that makes sense (I'm new to http as well as python).


My code snippets:

myreq = urllib2.Request(url)
opener = urllib2.build_opener()
headers = feeddata.info()
cookie = headers['set-cookie']
cookie = cookie[:-8]


while x < 10:
     feeddata = opener.open(myreq)
     data = feeddata.read()
     myreq.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 6.0; 
Windows NT 5.1)')
     myreq.add_header('Cookie', cookie)
     print data[1600:1650]
     print '\n\n\n\n*****************Using Cookie: %s' % cookie
     print '****************Header info:  \n',headers
     sleep(3)
     x = x+1

Any help greatly appreciated.  Thanks in advance, and when I know what 
I'm doing I'll repay the favors.

-Josh



More information about the Python-list mailing list