Is Python good for web crawlers?
Xavier Morel
xavier.morel at masklinn.net
Tue Feb 7 16:37:04 EST 2006
Paul Rubin wrote:
> Generally I use urllib.read() to get
> the whole html page as a string, then process it from there. I just
> look for the substrings I'm interested in, making no attempt to
> actually parse the html into a DOM or anything like that.
>
BeautifulSoup works *really* well when you want to parse the source
(e.g. when you don't want to use string matching, or when the structures
you're looking for are a bit too complicated for simple string
matching/substring search)
The API of the package is extremely simple, straightforward and... obvious.
More information about the Python-list
mailing list