How's python's web scraping capabilities (vs LWP) ...

Kent Johnson kent at kentsjohnson.com
Sat Apr 8 22:04:30 EDT 2006


ArKane wrote:
> Hello all,
> 
> I've been hacking away at perl for a few months now, mainly using the
> LWP module, used for web scraping. Amoung its capabilities include
> support for HTTPS and proxies, authentication, cookies (including the
> ability to automatically import Internet Explorer cookies), etc.

urllib2 (in the standard library) will do most of this and gets you the 
HTML from a site. To pull data out of the HTML try BeautifulSoup.

Kent



More information about the Python-list mailing list