Suitable Python code to scrape specific details from web pages.

alex23 wuwei23 at gmail.com
Mon Aug 18 01:04:30 EDT 2014


On 13/08/2014 7:28 AM, Roy Smith wrote:
> Second, if you're going to be parsing web pages, trying to use regexes
> is a losing game.  You need something that knows how to parse HTML.  The
> canonical answer is lxml (http://lxml.de/), but Beautiful Soup
> (http://www.crummy.com/software/BeautifulSoup/) is less intimidating to
> use.

lxml also has a BeautifulSoup parser, so you can easily mix and match 
approaches:

http://lxml.de/elementsoup.html




More information about the Python-list mailing list