Re: Newbie..Needs Help




----- Original Message -----
From: "Graham Feeley" <grahamjfeeley@xxxxxxxxxxxxxxx>
Newsgroups: comp.lang.python
To: <python-list@xxxxxxxxxx>
Sent: Friday, July 28, 2006 5:11 PM
Subject: Re: Newbie..Needs Help


Thanks Nick for the reply
Of course my first post was a general posting to see if someone would be
able to help
here is the website which holds the data I require
http://www.aapracingandsports.com.au/racing/raceresultsonly.asp?storydate=27/07/2006&meetings=bdgo

The fields required are as follows
NSW Tab
# Win Place
2 $4.60 $2.40
5 $2.70
1 $1.30
Quin $23.00
Tri $120.70
Field names are
Date ( not important )
Track................= Bendigo
RaceNo............on web page
Res1st...............2
Res2nd..............5
Res3rd..............1
Div1..................$4.60
DivPlc...............$2.40
Div2..................$2.70
Div3..................$1.30
DivQuin.............$23.00
DivTrif...............$120.70
As you can see there are a total of 6 meetings involved and I would need to
put in this parameter ( =bdgo) or (=gosf) these are the meeting tracks

Hope this more enlightening
Regards
graham


Graham,

Only a few days ago I gave someone a push who had a very similar problem. I handed him code ready to run. I am doing it again for
you.
The site you use is much harder to interpret than the other one was and so I took the opportunity to experimentally stretch
the envelope of a new brain child of mine: a stream editor called SE. It is new and so I also take the opportunity to demo it.
One correspondent in the previous exchange was Paul McGuire, the author of 'pyparse'. He made a good case for using 'pyparse'
in situations like yours. Unlike a stream editor, a parser reads structure in addition to data and can relate the data to its
context.
Anlayzing the tables I noticed that they are poorly structured: The first column contains both data and ids. Some records are
shorter than others, so column ids have to be guessed and hard coded. Missing data sometimes is a dash, sometimes nothing. The
inconsistencies seem to be consistent, though, down the eight tables of the page. So they can be formalized with some confidence
that they are systematic. If Paul could spend some time on this, I'd be much interested to see how he would handle the relative
disorder.
Another thought: The time one invests in developing a program should not exceed the time it can save overall (not talking
about recreational programming). Web pages justify an extra measure of caution, because they may change any time and when they do
they impose an unscheduled priority every time the reader stops working and requires a revision.

So, here is your program. I write it so you can copy the whole thing to a file. Next copy SE from the Cheese Shop. Unzip it and put
both SE.PY and SEL.PY where your Python progams are. Then 'execfile' the code in an IDLE window, call 'display_horse_race_data
('Bendigo', '27/07/2006') and see what happens. You'll have to wait ten seconds or so.

Regards

Frederic

######################################################################################

TRACKS = { 'New Zealand' : '',
'Bendigo' : 'bdgo',
'Gosford' : 'gosf',
'Northam' : 'nthm',
'Port Augusta': 'pta',
'Townsville' : 'town',
}


# This function does it all once all functions are loaded. If nothing shows, the
# page has not data.

def display_horse_race_data (track, date, clip_summary = 100):

"""
tracks: e.g. 'Bendigo' or 'bdgo'
date: e.g. '27/07/2006'
clip_summary: each table has a long summary header.
the argument says hjow much of it to show.
"""

if track [0].isupper ():
if TRACKS.has_key (track):
track = TRACKS [track]
else:
print 'No such track %s' % track
return
open ()
header, records = get_horse_race_data (track, date)
show_records (header, records, clip_summary)



######################################################################################


import SE, urllib

_is_open = 0

def open ():

global _is_open

if not _is_open: # Skip repeat calls

global Data_Filter, Null_Data_Marker, Tag_Stripper, Space_Deflator, CSV_Maker

# Making the following Editors is a step-by-step process, adding one element at a time and
# looking at what it does and what should be done next.
# Get pertinent data segments
header = ' "~(?i)Today\'s Results - .+?<div style="padding-top:5px;">~==*END*OF*HEADER*" '
race_summary = ' "~(?i)Race [1-9].*?</font><br>~==" '
data_segment = ' "~(?i)<table border=0 width=100% cellpadding=0 cellspacing=0>(.|\n)*?</table>~==*END*OF*SEGMENT*" '
Data_Filter = SE.SE (' <EAT> ' + header + race_summary + data_segment)

# Some data items are empty. Fill them with a dash.
mark_null_data = ' "~(?i)>\s*&nbsp;\s*</td>~=>-" '
Null_Data_Marker = SE.SE (mark_null_data + ' "&nbsp;= " ')

# Dump the tags
eat_tags = ' "~<(.|\n)*?>~=" '
eat_comments = ' "~<!--(.|\n)*?-->~=" '
Tag_Stripper = SE.SE (eat_tags + eat_comments + ' (13)= ')

# Visual inspection is easier without all those tabs and empty lines
Space_Deflator = SE.SE ('"~\n[\t ]+~=(10)" "~[\t ]+\n=(10)" | "~\n+~=(10)"')

# Translating line breaks to tabs will make a tab-delimited CSV
CSV_Maker = SE.SE ( '(10)=(9)' )

_is_open = 1 # Block repeat calls



def close ():

"""Call close () if you want to free up memory"""

global Data_Filter, Null_Data_Marker, Tag_Stripper, Space_Deflator, CSV_Maker
del Data_Filter, Null_Data_Marker, Tag_Stripper, Space_Deflator, CSV_Maker
urllib.urlcleanup ()
del urllib
del SE



def get_horse_race_data (track, date):

"""tracks: 'bndg' or (the other one)
date: e.g. '27/07/2006'
The website shows partial data or none at all, probably depending on
race schedules. The relevance of the date in the url is unclear.
"""

def make_url (track, date):
return 'http://www.aapracingandsports.com.au/racing/raceresultsonly.asp?storydate=%s&meetings=%s' % (date, track)

page = urllib.urlopen (make_url (track, date))
p = page.read ()
page.close ()
# When developing the program, don't get the file from the internet on
# each call. Download it and read it from the hard disk.

raw_data = Data_Filter (p)
raw_data_marked = Null_Data_Marker (raw_data)
raw_data_no_tags = Tag_Stripper (raw_data_marked)
raw_data_compact = Space_Deflator (raw_data_no_tags)
data = CSV_Maker (raw_data_compact)
header, tables = data.split ('*END*OF*HEADER*', 1)
records = tables.split ('*END*OF*SEGMENT*')
return header, records [:-1]



def show_record (record, clip_summary = 100):

"""clip_summary: None will display it all"""

# The records all have 55 fields.
# These are the relevant indexes:
SUMMARY = 0
FIRST = 8
FIRST_NSWTAB_WIN = 9
FIRST_NSWTAB_PLACE = 10
FIRST_TABCORP_WIN = 11
FIRST_TABCORP_PLACE = 12
FIRST_UNITAB_WIN = 13
FIRST_UNITAB_PLACE = 14
SECOND = 15
SECOND_NSWTAB_PLACE = 17
SECOND_TABCORP_PLACE = 19
SECOND_UNITAB_PLACE = 21
THIRD = 22
THIRD_NSWTAB_PLACE = 23
THIRD_TABCORP_PLACE = 24
THIRD_UNITAB_PLACE = 25
QUIN_NSWTAB_PLACE = 28
QUIN_TABCORP_PLACE = 30
QUIN_UNITAB_PLACE = 32
EXACTA_NSWTAB_PLACE = 35
EXACTA_TABCORP_PLACE = 37
EXACTA_UNITAB_PLACE = 39
TRI_NSWTAB_PLACE = 41
TRI_TABCORP_PLACE = 42
TRI_UNITAB_PLACE = 43
DDOUBLE_NSWTAB_PLACE = 46
DDOUBLE_TABCORP_PLACE = 48
DDOUBLE_UNITAB_PLACE = 50
SUB_SCR_NSW = 52
SUB_SCR_TABCORP = 53
SUB_SCR_UNITAB = 54

if clip_summary == None:
print record [SUMMARY]
else:
print record [SUMMARY] [:clip_summary] + '...'
print

# Your specification:
# Date ( not important ) -> In url and summary of first record
# Track................= Bendigo -> In url and summary of first record
# RaceNo............on web page -> In summary (index of record + 1?)
# Res1st...............2
# Res2nd..............5
# Res3rd..............1
# Div1..................$4.60
# DivPlc...............$2.40
# Div2..................$2.70
# Div3..................$1.30
# DivQuin.............$23.00
# DivTrif...............$120.70

print 'Res1st > %s' % record [FIRST]
print 'Res2nd > %s' % record [SECOND]
print 'Res3rd > %s' % record [THIRD]
print 'Div1 > %s' % record [FIRST_NSWTAB_WIN]
print 'DivPlc > %s' % record [FIRST_NSWTAB_PLACE]
print 'Div2 > %s' % record [SECOND_NSWTAB_PLACE]
print 'Div3 > %s' % record [THIRD_NSWTAB_PLACE]
print 'DivQuin > %s' % record [QUIN_NSWTAB_PLACE]
print 'DivTrif > %s' % record [TRI_NSWTAB_PLACE]

# Add others as you like from the list of index names above



def show_records (header, records, clip_summary = 100):

print '\n%s\n' % header
for record in records:
show_record (record.split ('\t'), clip_summary)
print '\n'


##########################################################################
#
# show_records (records, 74) displays:
#
# Today's Results - 27/07/2006 BENDIGO
#
# Race 1 results:Carlsruhe Roadhouse Mdn Plate $11,000 2yo Maiden 1400m Appr...
#
# Res1st > 2
# Res2nd > 5
# Res3rd > 1
# Div1 > $4.60
# DivPlc > $2.40
# Div2 > $2.70
# Div3 > $1.30
# DivQuin > $23.00
# DivTrif > $120.70
#
#
# Race 2 results:Gerard K. House P/L Mdn Plate $11,000 3yo Maiden 1400m Appr...
#
# Res1st > 6
# Res2nd > 7
# Res3rd > 5
# Div1 > $3.50
# DivPlc > $1.60
# Div2 > $2.60
# Div3 > $1.40
# DivQuin > $18.60
# DivTrif > $75.80
#
#
# Race 3 results:Richard Cambridge Printers Mdn $11,000 3yo Maiden 1400m Appr...
#
# Res1st > 11
# Res2nd > 12
# Res3rd > 1
# Div1 ...
#
# ... etc
#



.



Relevant Pages

  • Re: Network/Routing Question
    ... Since I am unsure of a way to install OWA to another website, ... I've killed OWA and SBS doing so. ... If your *default* website use remote.domainname.com, and you are thinking of adding a header to the default website then no, this cannot be done. ... And if you create a new site and add a header then you still have to make the default site accessible for the redirect. ...
    (microsoft.public.windows.server.sbs)
  • Re: run commands
    ... However, as this is your first post, add ... > many use an offline configuration for their news readers, ... > "Help" which consists of a series of links to your website, ...
    (microsoft.public.windowsxp.general)
  • Re: Ever seen a mail failure like this?
    ... "contact us" email to the site owner. ... The mail code is straight out ... website to an email address of that same website. ... If you use this MIME message class and set the Return-Path header and ...
    (comp.lang.php)
  • Re: Ever seen a mail failure like this?
    ... The mail code is straight out ... website to an email address of that same website. ... If you use this MIME message class and set the Return-Path header and ... Find and post PHP jobs ...
    (comp.lang.php)
  • VCD-Webauftritt war Re: Polizei =?ISO-8859-15?Q?M=FCnster?= macht sich =?ISO-8859&
    ... Für dich ist die Website vermutlich perfekt. ... Denn da kommt kein Jacascript ... PDF-Dokumenten aus dem Zeitraum von 2002-2003. ... Dazu kommt, dass er nicht einmal zu seinen drf-Ergüssen steht, seine header ...
    (de.rec.fahrrad)