Re: Perl script to extract data from webpage? (knucklehead newbie).
From: Gavin Williams (williams.gavin_at_comcast.net)
Date: 06/24/04
- Next message: Gavin Williams: "Re: Perl socket on linux won't accept connections from windows clients"
- Previous message: Gavin Williams: "Problem with memory when using "threads" with Perl 5.8 on Windows System"
- In reply to: Ryan Haskell: "Perl script to extract data from webpage? (knucklehead newbie)."
- Next in thread: Ryan Haskell: "Re: Perl script to extract data from webpage? (knucklehead newbie)."
- Reply: Ryan Haskell: "Re: Perl script to extract data from webpage? (knucklehead newbie)."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 24 Jun 2004 15:36:12 -0400
Your $& is a special perl variable that represents the string matched by
the last successful pattern match...which in the case of your example
happens to be /[0-9]{1,3}\%/.....a pattern match which basically says
"return a pattern that contains a number from 1 to 3 digits long followed by
a "%" character.
Maybe an easier way of writing that same section of code would be:
# true if $_ contains "CLASS=obsInfo2>" followed by a 1-3 digit number and a
"%", concluded by a "</TD>"
if ( /CLASS=obsInfo2>([0-9]{1,3}\%)<\/TD>/i ) {
print "Humidity: $+\n" ;
}
# Note that I had to use \ to "quote" the / in </TD> or it would have been
interpreted as the end of the pattern
# Also used an "i" after the pattern to indicated case sensitivity
checking is Case Insensitive.
# "$+" is another special perl variable, that returns the value inside of
the ( ) from the last successful match
# "$&" returns the entire matched string
# "$`" returns everything before the matched string
# "$'" returns everything after the matched string
To get pressure, you might add:
# true if $_ contains the string "inches", and uses ".*" as a wildcard match
for the text we want to return
if ( /inches/i && /CLASS=obsInfo2>(.*)<\/TD>/i )
print "Pressure: $+\n" ;
}
"Ryan Haskell" <ryan_haskell@hotmail.com> wrote in message
news:e426fc0.0406231146.13f59167@posting.google.com...
> Hello folks. I regret to announce that my understanding of Perl is
> virtually nonexistant, and I'm looking for a little instruction. My
> goal is to utilize a Perl script to extract specific numeric data from
> various web pages, and then feed that data to MRTG for graphing
> purposes. I have this running now using a script I found elsewhere,
> and am using it to pull current temperature for my area from
> www.weather.com and create a graph. Now I want to use the same
> technique for other data elsewhere. Problem is, I can't figure out
> how to modify this perl script to find the data of interest in a given
> page, because I don't understand how the script actually locates the
> data. The script itself is available from
>
> http://howto.aphroland.de/HOWTO/MRTG/Scripts/weather4.pl
>
> and here is a short excerpt from it, where the script parses the html
> page from www.weather.com for the humidity data:
>
> if ( /\%/ && /obsInfo2/ && ! /WIDTH/ ) {
> if (/[0-9]{1,3}\%/) {
> if ( $debug == 1 ) {
> unless ( $& ) { die "Cannot determine the humidity!\n"; }
> $humidity = $&;
> chop ($humidity);
> print "Humidity: $humidity\n";
>
>
>
> And below is the relevant section of the html code from
> www.weather.com that is being parsed:
>
>
> <BR>
> <TABLE BORDER=0 CELLPADDING=0 WIDTH=100% CELLSPACING=0>
> <TR><TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo1 WIDTH=40%>UV Index:</TD>
> <TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo2>3 Low</TD></TR>
> <TR><TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo1>Dew Point:</TD>
> <TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo2>51°F</TD></TR>
> <TR><TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo1>Humidity:</TD>
> <TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo2>40%</TD></TR>
> <TR><TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo1>Visibility:</TD>
> <TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo2>10.0 miles</TD></TR>
> <TR><TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo1>Pressure:</TD>
> <TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo2>29.79 inches and
> rising</TD></TR>
> <TR><TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo1>Wind:</TD>
> <TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo2>From the North at 13 gusting
> to 18 mph</TD></TR>
>
>
> I can see that "&" and "obsInfo2" are text strings found within the
> html page on either side of the desired value, but I'm not clear on
> how the perl script pulls the actual value (in this case 40) out of
> the data and assigns it to the $humidity variable. How would I modify
> the perl script if I wanted to get, for example, the pressure instead?
> (which is 29.97 in the html example above.) I think if I could
> understand how this variable matching/assignment is occuring, I could
> then use this script to fetch almost any number from any web page,
> right?
>
> For another example, let's say I wanted to pull the value for "Heat
> Index" off the NWS Weather page at:
>
> http://weather.noaa.gov/weather/current/KVDF.html
>
> What would I do?
>
> Thanks for any help!
> Ryan Haskell
- Next message: Gavin Williams: "Re: Perl socket on linux won't accept connections from windows clients"
- Previous message: Gavin Williams: "Problem with memory when using "threads" with Perl 5.8 on Windows System"
- In reply to: Ryan Haskell: "Perl script to extract data from webpage? (knucklehead newbie)."
- Next in thread: Ryan Haskell: "Re: Perl script to extract data from webpage? (knucklehead newbie)."
- Reply: Ryan Haskell: "Re: Perl script to extract data from webpage? (knucklehead newbie)."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|