HTML parsing/scraping & python

Sanjay Arora sanjay.k.arora at gmail.com
Wed Nov 30 04:25:56 EST 2005


We are looking to select the language & toolset more suitable for a
project that requires getting data from several web-sites in real-
time....html parsing/scraping. It would require full emulation of the
browser, including handling cookies, automated logins & following
multiple web-link paths. Multiple threading would be a plus but not
requirement.

Some solutions were suggested:

Perl:

LWP::Simple
WWW::Mechanize
HTML::Parser

Curl & libcurl:

Can you suggest solutions for python? Pros & Cons using Perl vs. Python?
Why Python?

Pointers to  various other tools & their comparisons  with python
solutions will be most appreciated. Anyone who is knowledgeable about
the application subject, please do share your knowledge to help us do
this right.

With best regards.
Sanjay.




More information about the Python-list mailing list