begin to parse a web page not entirely downloaded

Leif K-Brooks eurleif at ecritters.biz
Thu Feb 8 12:54:29 EST 2007


k0mp wrote:
> Is there a way to retrieve a web page and before it is entirely
> downloaded, begin to test if a specific string is present and if yes
> stop the download ?
> I believe that urllib.openurl(url) will retrieve the whole page before
> the program goes to the next statement.

Use urllib.urlopen(), but call .read() with a smallish argument, e.g.:

 >>> foo = urllib.urlopen('http://google.com')
 >>> foo.read(512)
'<html><head> ...

foo.read(512) will return as soon as 512 bytes have been received. You 
can keep caling it until it returns an empty string, indicating that 
there's no more data to be read.



More information about the Python-list mailing list