web crawler help?

Carl kingprad at mail.com
Mon Sep 9 10:52:10 EDT 2002


I just wanted to say also that a lot of sites have a robots.txt file
in the root directory with a list of pages the crawler shouldn't troll
through. it's polite to honor it if you're grabbing tons of pages from
a server. Probably fine to ignore if you're not using a lot of server
time and only doing a few simple tasks.



More information about the Python-list mailing list