Re: Perl script to extract data from webpage? (knucklehead newbie).

From: Gavin Williams (williams.gavin_at_comcast.net)
Date: 06/24/04


Date: Thu, 24 Jun 2004 15:36:12 -0400

Your $& is a special perl variable that represents the string matched by
the last successful pattern match...which in the case of your example
happens to be /[0-9]{1,3}\%/.....a pattern match which basically says
"return a pattern that contains a number from 1 to 3 digits long followed by
a "%" character.

Maybe an easier way of writing that same section of code would be:

# true if $_ contains "CLASS=obsInfo2>" followed by a 1-3 digit number and a
"%", concluded by a "</TD>"

if ( /CLASS=obsInfo2>([0-9]{1,3}\%)<\/TD>/i ) {
  print "Humidity: $+\n" ;
}

  # Note that I had to use \ to "quote" the / in </TD> or it would have been
interpreted as the end of the pattern
  # Also used an "i" after the pattern to indicated case sensitivity
checking is Case Insensitive.

  # "$+" is another special perl variable, that returns the value inside of
the ( ) from the last successful match
  # "$&" returns the entire matched string
  # "$`" returns everything before the matched string
  # "$'" returns everything after the matched string

To get pressure, you might add:

# true if $_ contains the string "inches", and uses ".*" as a wildcard match
for the text we want to return

if ( /inches/i && /CLASS=obsInfo2>(.*)<\/TD>/i )
  print "Pressure: $+\n" ;
}

"Ryan Haskell" <ryan_haskell@hotmail.com> wrote in message
news:e426fc0.0406231146.13f59167@posting.google.com...
> Hello folks. I regret to announce that my understanding of Perl is
> virtually nonexistant, and I'm looking for a little instruction. My
> goal is to utilize a Perl script to extract specific numeric data from
> various web pages, and then feed that data to MRTG for graphing
> purposes. I have this running now using a script I found elsewhere,
> and am using it to pull current temperature for my area from
> www.weather.com and create a graph. Now I want to use the same
> technique for other data elsewhere. Problem is, I can't figure out
> how to modify this perl script to find the data of interest in a given
> page, because I don't understand how the script actually locates the
> data. The script itself is available from
>
> http://howto.aphroland.de/HOWTO/MRTG/Scripts/weather4.pl
>
> and here is a short excerpt from it, where the script parses the html
> page from www.weather.com for the humidity data:
>
> if ( /\%/ && /obsInfo2/ && ! /WIDTH/ ) {
> if (/[0-9]{1,3}\%/) {
> if ( $debug == 1 ) {
> unless ( $& ) { die "Cannot determine the humidity!\n"; }
> $humidity = $&;
> chop ($humidity);
> print "Humidity: $humidity\n";
>
>
>
> And below is the relevant section of the html code from
> www.weather.com that is being parsed:
>
>
> <BR>
> <TABLE BORDER=0 CELLPADDING=0 WIDTH=100% CELLSPACING=0>
> <TR><TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo1 WIDTH=40%>UV Index:</TD>
> <TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo2>3&nbsp;Low</TD></TR>
> <TR><TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo1>Dew Point:</TD>
> <TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo2>51&deg;F</TD></TR>
> <TR><TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo1>Humidity:</TD>
> <TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo2>40%</TD></TR>
> <TR><TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo1>Visibility:</TD>
> <TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo2>10.0 miles</TD></TR>
> <TR><TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo1>Pressure:</TD>
> <TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo2>29.79 inches and
> rising</TD></TR>
> <TR><TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo1>Wind:</TD>
> <TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo2>From the North at 13 gusting
> to 18&nbsp;mph</TD></TR>
>
>
> I can see that "&" and "obsInfo2" are text strings found within the
> html page on either side of the desired value, but I'm not clear on
> how the perl script pulls the actual value (in this case 40) out of
> the data and assigns it to the $humidity variable. How would I modify
> the perl script if I wanted to get, for example, the pressure instead?
> (which is 29.97 in the html example above.) I think if I could
> understand how this variable matching/assignment is occuring, I could
> then use this script to fetch almost any number from any web page,
> right?
>
> For another example, let's say I wanted to pull the value for "Heat
> Index" off the NWS Weather page at:
>
> http://weather.noaa.gov/weather/current/KVDF.html
>
> What would I do?
>
> Thanks for any help!
> Ryan Haskell



Relevant Pages

  • Re: Rename File Using Strring Found in File?
    ... OK, thanks, but the script does not seem to rename the files. ... You can set the working directory from within your Perl ... match for the "Citation:" etc string. ... The Perl Language and the Regular Expression Language are different ...
    (comp.lang.perl.misc)
  • my script crashes when I try to rename the file!
    ... OK, thanks, but the script does not seem to rename the files. ... You can set the working directory from within your Perl ... # sleep 1; ... to the string in this particular file that I want to match. ...
    (perl.beginners)
  • Re: How to pass string in command line argument.
    ... Perl Pra wrote: ... i have perl script that searches given string (the string should be passed ... The string should be sent to the script with double quotes attached to it ...
    (perl.beginners)
  • Re: Need help understanding how a file input block works
    ... here is a section of the Perl code that I am having ... contents of $line match the pattern ... ^ says 'match if we are at the start of the string', ... \w says 'match any word character', ...
    (comp.lang.perl.misc)
  • Re: How to pass string in command line argument.
    ... >>> my perl script is something like this ... >>> just a straight string is geeting passed ... >>> but i need to send the entire string including double quotes to the ...
    (perl.beginners)