read all available pages on a Website

Michael Foord fuzzyman at gmail.com
Mon Sep 13 09:31:58 EDT 2004


Brad Tilley <bradtilley at usa.net> wrote in message news:<ci2qnl$2jq$1 at solaris.cc.vt.edu>...
> Is there a way to make urllib or urllib2 read all of the pages on a Web 
> site? For example, say I wanted to read each page of www.python.org into 
> separate strings (a string for each page). The problem is that I don't 
> know how many pages are at www.python.org. How can I handle this?
> 
> Thanks,
> 
> Brad

I can highly reccommend the BeautifulSoup parser for helping you to
extract all the links - should make it a doddle. (you want to check
that you only follwo links that are in www.python.org of course - the
standard library urlparse should help with that).

Regards,


Fuzzy

http://www.voidspace.org.uk/atlantibots/pythonutils.html



More information about the Python-list mailing list