web crawler help?

Carl kingprad at mail.com
Mon Sep 9 10:52:10 EDT 2002

Previous message (by thread): web crawler help?
Next message (by thread): web crawler help?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I just wanted to say also that a lot of sites have a robots.txt file
in the root directory with a list of pages the crawler shouldn't troll
through. it's polite to honor it if you're grabbing tons of pages from
a server. Probably fine to ignore if you're not using a lot of server
time and only doing a few simple tasks.

Previous message (by thread): web crawler help?
Next message (by thread): web crawler help?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-list mailing list