Parsing Baseball Stats

Paul McGuire ptmcg at austin.rr._bogus_.com
Mon Jul 24 12:04:10 EDT 2006


<ankitdesai at gmail.com> wrote in message
news:1153756130.949228.182160 at p79g2000cwp.googlegroups.com...
> I would like to parse a couple of tables within an individual player's
> SHTML page. For example, I would like to get the "Actual Pitching
> Statistics" and the "Translated Pitching Statistics" portions of Babe
> Ruth page (http://www.baseballprospectus.com/dt/ruthba01.shtml) and
> store that info in a CSV file.
>
> Also, I would like to do this for numerous players whose IDs I have
> stored in a text file (e.g.: cobbty01, ruthba01, speaktr01, etc.).
> These IDs should change the URL to get the corresponding player's
> stats. Is this doable and if yes, how? I have only recently finished
> learning Python (used the book: How to Think Like a Computer Scientist:
> Learning with Python). Thanks for your help...
>
Pyparsing and BeautifulSoup are both useful options to look into.

Also, take care not to run afoul of the terms of service for this site
(included below).  A strict interpretation of them probably prohibits what
you intend to do.

-- Paul


Restrictions

You agree to not use the Service:

* to upload, post, email, transmit or otherwise make available (A) any
content that you do not have a right to make available and any commercial
publication or exploitation of the Service or any content provided in
connection therewith is specifically prohibited and anyone wishing to do so
must first request and receive prior written permission from PEV to do so;
(B) any content that infringes any patent, trademark, trade secret,
copyright or other proprietary rights ("Rights") of any party; (C) any
unsolicited or unauthorized advertising, promotional materials, "junk mail,"
"spam," "chain letters," "pyramid schemes," or any other form of
solicitation except for vendors so authorized to do so; or (D) any content
that is unlawful, harmful, threatening, abusive, harassing, tortious,
defamatory, vulgar, obscene, libelous, invasive of another's privacy,
hateful, or racially, ethnically or otherwise objectionable;

* to forge headers or otherwise manipulate identifiers in order to disguise
the origin of any content transmitted through or made available through the
Service;

* to collect or store personal data about other users; or deleting or
revising any content (including, but not limited to, legal notices) posted
by PEV or attempting to decipher, decompile, disassemble or reverse engineer
any of the software or content provided through, comprising or making up the
Service;

* to use or attempt to use any engine, software, tool, agent or other device
or mechanism (including without limitation browsers, spiders, robots,
avatars or intelligent agents) to navigate or search this Site other than
the search engine and search agents available from Experience on this Site
and other than generally available third-party web browsers (e.g., Netscape
Navigator, Microsoft Explorer) or;

* to intentionally or unintentionally violate any applicable local, state,
national or international law.





More information about the Python-list mailing list