Is Python good for web crawlers?

Xavier Morel xavier.morel at masklinn.net
Tue Feb 7 16:37:04 EST 2006


Paul Rubin wrote:
> Generally I use urllib.read() to get
> the whole html page as a string, then process it from there.  I just
> look for the substrings I'm interested in, making no attempt to
> actually parse the html into a DOM or anything like that.
 >
BeautifulSoup works *really* well when you want to parse the source 
(e.g. when you don't want to use string matching, or when the structures 
you're looking for are a bit too complicated for simple string 
matching/substring search)

The API of the package is extremely simple, straightforward and... obvious.



More information about the Python-list mailing list