Re: Parsing Baseball Stats



Hi,

Below your solution ready to run. Put get_statistics () in a loop that feeds it the names from your file, makes an ouput file
name from it and passes both 'statistics' and the ouput file name to file_statistics ().

Cheers,

Frederic


----- Original Message -----
From: <ankitdesai@xxxxxxxxx>
Newsgroups: comp.lang.python
To: <python-list@xxxxxxxxxx>
Sent: Monday, July 24, 2006 5:48 PM
Subject: Parsing Baseball Stats


I would like to parse a couple of tables within an individual player's
SHTML page. For example, I would like to get the "Actual Pitching
Statistics" and the "Translated Pitching Statistics" portions of Babe
Ruth page (http://www.baseballprospectus.com/dt/ruthba01.shtml) and
store that info in a CSV file.

Also, I would like to do this for numerous players whose IDs I have
stored in a text file (e.g.: cobbty01, ruthba01, speaktr01, etc.).
These IDs should change the URL to get the corresponding player's
stats. Is this doable and if yes, how? I have only recently finished
learning Python (used the book: How to Think Like a Computer Scientist:
Learning with Python). Thanks for your help...

--
http://mail.python.org/mailman/listinfo/python-list

import SE, urllib

Tag_Stripper = SE.SE ('"~<.*?>~= " "~<[^>]*~=" "~[^<]*>~=" ')
CSV_Maker = SE.SE (' "~\s+~=(9)" ')

# SE is the hacker's Swiss army knife. You find it in the Cheese Shop.
# It strips your tags and puts in the CSV separator and if you needed other
# translations, it would do those too on two lines of code.
# If you don't want tabs, define the CSV_Maker accordingly, putting
# your separator in the place of '(9)':
# CSV_Maker = SE.SE ('"~\s+~=,"') # Now it's a comma

def get_statistics (name_of_player):

statistics = {

# Uncomment those you want
# 'Actual Batting Statistics' : [],
'Actual Pitching Statistics' : [],
# 'Advanced Batting Statistics' : [],
'Advanced Pitching Statistics' : [],
# 'Fielding Statistics as Center Fielder' : [],
# 'Fielding Statistics as First Baseman' : [],
# 'Fielding Statistics as Left Fielder' : [],
# 'Fielding Statistics as Pitcher' : [],
# 'Fielding Statistics as Right Fielder' : [],
# 'Statistics as DH/PH/Other' : [],
# 'Translated Batting Statistics' : [],
# 'Translated Pitching Statistics' : [],

}

url = 'http://www.baseballprospectus.com/dt/%s.shtml' % name_of_player
htm_page = urllib.urlopen (url)
htm_lines = htm_page.readlines ()
htm_page.close ()
current_list = None
for line in htm_lines:
text_line = Tag_Stripper (line).strip ()
if line.startswith ('<h3'):
if statistics.has_key (text_line):
current_list = statistics [text_line]
current_list.append (text_line)
else:
current_list = None
else:
if current_list != None:
if text_line:
current_list.append (CSV_Maker (text_line))

return statistics


def show_statistics (statistics):
for category in statistics:
for record in statistics [category]:
print record
print


def file_statistics (file_name, statistics):
f = file (file_name, 'wa')
for category in statistics:
f.write ('%s\n' % category)
for line in statistics [category][1:]:
f.write ('%s\n' % line)
f.close ()


.