Web Crawler - Python or Perl?

Nick Craig-Wood nick at craig-wood.com
Mon Jun 9 16:30:49 EDT 2008


disappearedng at gmail.com <disappearedng at gmail.com> wrote:
>  I am currently planning to write my own web crawler. I know Python but
>  not Perl, and I am interested in knowing which of these two are a
>  better choice given the following scenario:
> 
>  1) I/O issues: my biggest constraint in terms of resource will be
>  bandwidth throttle neck.
>  2) Efficiency issues: The crawlers have to be fast, robust and as
>  "memory efficient" as possible. I am running all of my crawlers on
>  cheap pcs with about 500 mb RAM and P3 to P4 processors
>  3) Compatibility issues: Most of these crawlers will run on Unix
>  (FreeBSD), so there should exist a pretty good compiler that can
>  optimize my code these under the environments.
> 
>  What are your opinions?

Use python with twisted.

With a friend I wrote a crawler.  Our first attempt was standard
python.  Our second attempt was with twisted.  Twisted absolutely blew
the socks off our first attempt - mainly because you can fetch 100s or
1000s of pages simultaneously, without threads.

Python with twisted will satisfy 1-3.  You'll have to get your head
around its asynchronous nature, but once you do you'll be writing a
killer crawler ;-)

As for Perl - once upon a time I would have done this with perl, but I
wouldn't go back now!

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list