search an entire website given the homepage URL

Fredrik Lundh fredrik at pythonware.com
Tue Apr 25 13:20:42 EDT 2006


"Bell, Kevin" wrote:

> I know I can use urllib2 to get at a website given urllib2.urlopen(url)
> but I'm unsure how to then go through all pages that are linked to it,
> but still in the domain.  If I want to search through the entire python
> website give the homepage, how would I go about it?

use a search engine (try the search box in the upper right corner).

using a spider to download the entire site just so you can "search through
it" is bloody impolite.

if you have a valid reason to download portions of the site, use wget's mirror
function, or some similar tool, and be nice.  there's a tool called "websucker"
in the Tools directory of the standard Python distribution that can also be used
to mirror portions of a site:

    http://svn.python.org/view/python/trunk/Tools/webchecker/

</F>






More information about the Python-list mailing list