Suitable Python code to scrape specific details from web pages.

Roy Smith roy at panix.com
Tue Aug 12 17:28:15 EDT 2014


In article <a8f10c4f-d4a0-48ed-ae92-2a43e9a094c3 at googlegroups.com>,
 Simon Evans <musicalhacksaw at yahoo.co.uk> wrote:

> Dear Programmers,
> I have been looking at the You tube 'Web Scraping Tutorials' of Chris Reeves. 
> I have tried a few of his python programs in the Python27 command prompt, but 
> altered them from accessing data using links say from the Dow Jones index, to 
> accessing the details I would be interested in accessing from the 'Racing 
> Post' on a daily basis. Anyhow, the code it returns is not in the example I 
> am going to give, is not the information I am seeking, instead of returning 
> the given odds on a horse, it only returns a [], which isn't much use. 
> I would be glad if you could tell me where I am going wrong. 

Rather than comment on your specific code (but, thank you for posting 
it), I'll make a couple of more generic suggestions.

First, if you're doing anything with fetching web pages, install the 
wonderful requests module (http://docs.python-requests.org/en/latest/).  
It's so much easier to work with than urllib.

Second, if you're going to be parsing web pages, trying to use regexes 
is a losing game.  You need something that knows how to parse HTML.  The 
canonical answer is lxml (http://lxml.de/), but Beautiful Soup 
(http://www.crummy.com/software/BeautifulSoup/) is less intimidating to 
use.



More information about the Python-list mailing list