Parsing Baseball Stats

Anthra Norell anthra.norell at tiscalinet.ch
Tue Jul 25 07:20:17 EDT 2006


Hi,

      Below your solution ready to run. Put get_statistics () in a loop that feeds it the names from your file, makes an ouput file
name from it and passes both 'statistics' and the ouput file name to file_statistics ().

Cheers,

Frederic


----- Original Message -----
From: <ankitdesai at gmail.com>
Newsgroups: comp.lang.python
To: <python-list at python.org>
Sent: Monday, July 24, 2006 5:48 PM
Subject: Parsing Baseball Stats


> I would like to parse a couple of tables within an individual player's
> SHTML page. For example, I would like to get the "Actual Pitching
> Statistics" and the "Translated Pitching Statistics" portions of Babe
> Ruth page (http://www.baseballprospectus.com/dt/ruthba01.shtml) and
> store that info in a CSV file.
>
> Also, I would like to do this for numerous players whose IDs I have
> stored in a text file (e.g.: cobbty01, ruthba01, speaktr01, etc.).
> These IDs should change the URL to get the corresponding player's
> stats. Is this doable and if yes, how? I have only recently finished
> learning Python (used the book: How to Think Like a Computer Scientist:
> Learning with Python). Thanks for your help...
>
> --
> http://mail.python.org/mailman/listinfo/python-list

import SE, urllib

Tag_Stripper = SE.SE ('"~<.*?>~= " "~<[^>]*~=" "~[^<]*>~=" ')
CSV_Maker    = SE.SE (' "~\s+~=(9)" ')

# SE is the hacker's Swiss army knife. You find it in the Cheese Shop.
# It strips your tags and puts in the CSV separator and if you needed other
# translations, it would do those too on two lines of code.
#       If you don't want tabs, define the CSV_Maker accordingly, putting
# your separator in the place of '(9)':
# CSV_Maker = SE.SE ('"~\s+~=,"')  # Now it's a comma

def get_statistics (name_of_player):

   statistics = {

   # Uncomment those you want
   #   'Actual Batting Statistics'              : [],
        'Actual Pitching Statistics'             : [],
   #   'Advanced Batting Statistics'            : [],
        'Advanced Pitching Statistics'           : [],
   #   'Fielding Statistics as Center Fielder'  : [],
   #   'Fielding Statistics as First Baseman'   : [],
   #   'Fielding Statistics as Left Fielder'    : [],
   #   'Fielding Statistics as Pitcher'         : [],
   #   'Fielding Statistics as Right Fielder'   : [],
   #   'Statistics as DH/PH/Other'              : [],
   #   'Translated Batting Statistics'          : [],
   #   'Translated Pitching Statistics'         : [],

   }

   url = 'http://www.baseballprospectus.com/dt/%s.shtml' % name_of_player
   htm_page = urllib.urlopen (url)
   htm_lines = htm_page.readlines ()
   htm_page.close ()
   current_list = None
   for line in htm_lines:
      text_line = Tag_Stripper (line).strip ()
      if line.startswith ('<h3'):
         if statistics.has_key (text_line):
            current_list = statistics [text_line]
            current_list.append (text_line)
         else:
            current_list = None
      else:
        if current_list != None:
            if text_line:
               current_list.append (CSV_Maker (text_line))

   return statistics


def show_statistics (statistics):
   for category in statistics:
      for record in statistics [category]:
         print record
      print


def file_statistics (file_name, statistics):
   f = file (file_name, 'wa')
   for category in statistics:
      f.write ('%s\n' % category)
      for line in statistics [category][1:]:
         f.write ('%s\n' % line)
   f.close ()





More information about the Python-list mailing list