Web Crawler - Python or Perl?

Stefan Behnel stefan_ml at behnel.de
Mon Jun 9 14:12:01 EDT 2008


disappearedng at gmail.com wrote:
> 1) I/O issues: my biggest constraint in terms of resource will be
> bandwidth throttle neck.
> 2) Efficiency issues: The crawlers have to be fast, robust and as
> "memory efficient" as possible. I am running all of my crawlers on
> cheap pcs with about 500 mb RAM and P3 to P4 processors
> 3) Compatibility issues: Most of these crawlers will run on Unix
> (FreeBSD), so there should exist a pretty good compiler that can
> optimize my code these under the environments.

You should rethink your requirements. You expect to be I/O bound, so why do
you require a good "compiler"? Especially when asking about two interpreted
languages...

Consider using lxml (with Python), it has pretty much everything you need for
a web crawler, supports threaded parsing directly from HTTP URLs, and it's
plenty fast and pretty memory efficient.

http://codespeak.net/lxml/

Stefan



More information about the Python-list mailing list