read all available pages on a Website
Michael Foord
fuzzyman at gmail.com
Mon Sep 13 09:31:58 EDT 2004
Brad Tilley <bradtilley at usa.net> wrote in message news:<ci2qnl$2jq$1 at solaris.cc.vt.edu>...
> Is there a way to make urllib or urllib2 read all of the pages on a Web
> site? For example, say I wanted to read each page of www.python.org into
> separate strings (a string for each page). The problem is that I don't
> know how many pages are at www.python.org. How can I handle this?
>
> Thanks,
>
> Brad
I can highly reccommend the BeautifulSoup parser for helping you to
extract all the links - should make it a doddle. (you want to check
that you only follwo links that are in www.python.org of course - the
standard library urlparse should help with that).
Regards,
Fuzzy
http://www.voidspace.org.uk/atlantibots/pythonutils.html
More information about the Python-list
mailing list