Suitable Python code to scrape specific details from web pages.

Denis McMahon denismfmcmahon at gmail.com
Wed Aug 13 10:53:41 EDT 2014


On Tue, 12 Aug 2014 13:00:30 -0700, Simon Evans wrote:

> in accessing from the 'Racing Post' on a daily basis. Anyhow, the code

Following is some starter code. You will have to look at the output, 
compare it to the web page, and work out how you want to process it 
further. Note that I use beautifulsoup and requests. The output is the 
html for each cell in the table with a line of "+" characters at the 
table row breaks. I suggest you look at the beautifulsoup documentation 
at http://www.crummy.com/software/BeautifulSoup/bs4/doc/ to work out how 
you may wish to select which table cells contain data you are interested 
in and how to extract it.

#!/usr/bin/python
"""
Program to extract data from racingpost.
"""

from bs4 import BeautifulSoup
import requests

r = requests.get( "http://www.racingpost.com/horses2/cards/card.sd?
race_id=607466&r_date=2014-08-13#raceTabs=sc_" )

if r.status_code == 200:
    soup = BeautifulSoup( r.content )
    table = soup.find( "table", id="sc_horseCard" )
    for row in table.find_all( "tr" ):
        for cell in row.find_all( "td" ):
            print cell
        print "+++++++++++++++++++++++++++++++++++++"
else:
    print "HTTP Status", r.status_code

-- 
Denis McMahon, denismfmcmahon at gmail.com



More information about the Python-list mailing list