Not necessarily related to python Web Crawlers
defn noob
circularfunc at yahoo.se
Sat Jul 5 05:07:04 EDT 2008
just crawling is supereasy. its how to index and search that is hard.
just start at yahoo.com, scrape out all the links and then for every
site visit every link.
i wrote a crawler in 15 lines of code. but then it all it did was
visit the sites, not indexing them or anything.
you could write a faster one in C++ probably but if you are new to it
doing it in python will let you experiment and learn faster.
some links:
http://infolab.stanford.edu/~backrub/google.html
http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html
http://www.example-code.com/python/pythonspider.asp
http://www.example-code.com/python/spider_simpleCrawler.asp
More information about the Python-list
mailing list