Fetching a clean copy of a changing web page

John Nagle nagle at animats.com
Mon Jul 16 01:00:36 EDT 2007


    I'm reading the PhishTank XML file of active phishing sites,
at "http://data.phishtank.com/data/online-valid/"  This changes
frequently, and it's big (about 10MB right now) and on a busy server.
So once in a while I get a bogus copy of the file because the file
was rewritten while being sent by the server.

    Any good way to deal with this, short of reading it twice
and comparing?

				John Nagle



More information about the Python-list mailing list