Parsing Baseball Stats
Anthra Norell
anthra.norell at tiscalinet.ch
Tue Jul 25 07:20:17 EDT 2006
Hi,
Below your solution ready to run. Put get_statistics () in a loop that feeds it the names from your file, makes an ouput file
name from it and passes both 'statistics' and the ouput file name to file_statistics ().
Cheers,
Frederic
----- Original Message -----
From: <ankitdesai at gmail.com>
Newsgroups: comp.lang.python
To: <python-list at python.org>
Sent: Monday, July 24, 2006 5:48 PM
Subject: Parsing Baseball Stats
> I would like to parse a couple of tables within an individual player's
> SHTML page. For example, I would like to get the "Actual Pitching
> Statistics" and the "Translated Pitching Statistics" portions of Babe
> Ruth page (http://www.baseballprospectus.com/dt/ruthba01.shtml) and
> store that info in a CSV file.
>
> Also, I would like to do this for numerous players whose IDs I have
> stored in a text file (e.g.: cobbty01, ruthba01, speaktr01, etc.).
> These IDs should change the URL to get the corresponding player's
> stats. Is this doable and if yes, how? I have only recently finished
> learning Python (used the book: How to Think Like a Computer Scientist:
> Learning with Python). Thanks for your help...
>
> --
> http://mail.python.org/mailman/listinfo/python-list
import SE, urllib
Tag_Stripper = SE.SE ('"~<.*?>~= " "~<[^>]*~=" "~[^<]*>~=" ')
CSV_Maker = SE.SE (' "~\s+~=(9)" ')
# SE is the hacker's Swiss army knife. You find it in the Cheese Shop.
# It strips your tags and puts in the CSV separator and if you needed other
# translations, it would do those too on two lines of code.
# If you don't want tabs, define the CSV_Maker accordingly, putting
# your separator in the place of '(9)':
# CSV_Maker = SE.SE ('"~\s+~=,"') # Now it's a comma
def get_statistics (name_of_player):
statistics = {
# Uncomment those you want
# 'Actual Batting Statistics' : [],
'Actual Pitching Statistics' : [],
# 'Advanced Batting Statistics' : [],
'Advanced Pitching Statistics' : [],
# 'Fielding Statistics as Center Fielder' : [],
# 'Fielding Statistics as First Baseman' : [],
# 'Fielding Statistics as Left Fielder' : [],
# 'Fielding Statistics as Pitcher' : [],
# 'Fielding Statistics as Right Fielder' : [],
# 'Statistics as DH/PH/Other' : [],
# 'Translated Batting Statistics' : [],
# 'Translated Pitching Statistics' : [],
}
url = 'http://www.baseballprospectus.com/dt/%s.shtml' % name_of_player
htm_page = urllib.urlopen (url)
htm_lines = htm_page.readlines ()
htm_page.close ()
current_list = None
for line in htm_lines:
text_line = Tag_Stripper (line).strip ()
if line.startswith ('<h3'):
if statistics.has_key (text_line):
current_list = statistics [text_line]
current_list.append (text_line)
else:
current_list = None
else:
if current_list != None:
if text_line:
current_list.append (CSV_Maker (text_line))
return statistics
def show_statistics (statistics):
for category in statistics:
for record in statistics [category]:
print record
print
def file_statistics (file_name, statistics):
f = file (file_name, 'wa')
for category in statistics:
f.write ('%s\n' % category)
for line in statistics [category][1:]:
f.write ('%s\n' % line)
f.close ()
More information about the Python-list
mailing list