Is Python good for web crawlers?

Tue Feb 7 16:37:04 EST 2006

Paul Rubin wrote:
> Generally I use urllib.read() to get
> the whole html page as a string, then process it from there.  I just
> look for the substrings I'm interested in, making no attempt to
> actually parse the html into a DOM or anything like that.
 >
BeautifulSoup works *really* well when you want to parse the source 
(e.g. when you don't want to use string matching, or when the structures 
you're looking for are a bit too complicated for simple string 
matching/substring search)

The API of the package is extremely simple, straightforward and... obvious.