web crawler in python

my name gm41lu53r at gmail.com
Wed Dec 9 19:39:34 EST 2009


I'm currently planning on writing a web crawler in python but have a
question as far as how I should design it. My goal is speed and maximum
efficient use of the hardware\bandwidth I have available.

As of now I have a Dual 2.4ghz xeon box, 4gb ram, 500gb sata and a 20mbps
bandwidth cap (for now) . Running FreeBSD.

What would be the best way to design the crawler? Using the thread module?
Would I be able to max out this connection with the hardware listed above
using python threads?

Thank you kindly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20091209/b9d9def5/attachment-0001.html>


More information about the Python-list mailing list