Refreshing of urllib.urlopen()

Steve Holden steve at holdenweb.com
Wed Feb 3 23:02:35 EST 2010


Michael Gruenstaeudl wrote:
> Hi,
> I am fairly new to Python and need advice on the urllib.urlopen()
> function. The website I am trying to open automatically refreshes after
> 5 seconds and remains stable thereafter. With urllib.urlopen().read() I
> can only read the initial but not the refreshed page. How can I access
> the refreshed page via urlopen().read()? I have already tried to
> intermediate with time.sleep() before invoking .read() (see below), but
> this does not work.
> 
> page=urllib.urlopen(url)
> time.sleep(20)
> htmltext=page.readlines()
> 
When you say the page "refreshes" every 5 seconds, does it do so by
redirecting the browser to the same address with new content?

I suspect this is the case, because otherwise page.readlines() would not
return because it wouldn't have seen the "end of file" on the incoming
network stream.

You can find this out by examining the page's headers. If

   page.headers['Refresh']

exists and has a value (like "5; url=http://<same url>") then browser
refresh is being used.

If that's so then the only way to access the content is to re-open the
URL and read the updated content.

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010  http://us.pycon.org/
Holden Web LLC                 http://www.holdenweb.com/
UPCOMING EVENTS:        http://holdenweb.eventbrite.com/




More information about the Python-list mailing list