Need some help filtering thru results
- From: mickalo@xxxxxxxxxxxxxxx (Mike Blezien)
- Date: Wed, 30 Aug 2006 12:10:19 -0500
Hello,
We need to grab some data from a webpage fetch via the LWP module. This is the coding and
the $resultdata below, need to regrex out various data, indicated by the [ ] brackets... see below for further explainations.
My regrex is not very strong and need to some help figuring out the best way to do this.
===============================================================
#!/usr/bin/perl
BEGIN { open (STDERR, ">./mandy_error.log"); }
use CGI::Carp qw(fatalsToBrowser);
use CGI qw(:standard);
use HTTP::Request;
use LWP::UserAgent;
use strict;
my $agent = "Thunder Rain Scraper";
my $adminemail = 'mickalo@xxxxxxxxxxxxxxx';
my $urltofetch = 'http://www.mandy.com/1/jobs2.cfm?terr=usny&skill=crw&paid=no&p=';
my $resultdata = fetch_results($urltofetch);
print header();
if(defined($resultdata))
{
# process resulting data returned
$resultdata =~ s/&/&/ig;
$resultdata =~ s/ / /ig;
LOOP:
for my $lines ( split(/\n/,$resultdata) )
{
if($lines =~ /<tr class=\"main\"/i) # THIS IS NOT WORKING.
{
# DO STUFF HERE -
}
}
}
else
{
print qq~\nNo Result Data Returned\r\n~;
}
print qq~\nProcess Completed\n~;
exit();
sub fetch_results {
my $url = shift();
# MAIN
my $ua = new LWP::UserAgent; # create a new LWP agent
$ua->from($adminemail); # set HTTP From
$ua->agent($agent); # set Agent-Name
# retrieve the file from $url
my $request = new HTTP::Request GET => $url;
my $response = $ua->request($request);
# return content
if ($response->is_success()) { return $response->content(); }
else { return undef; }
}
__END__
===================================================================
Now the data returned, we need to filter out all except where it has <!-- START GRABBING RESULT HERE -->
till the <!-- END RESULT HERE --> I need to grab the data within the [ ] brackets. Those brackets [ ] I inserted for clarification, there not normally there. And go through each <tr class="main"> (.*?)</tr> table cell up to the end of the </table>
######################################################################################
# FILTET TO RESULTS
.... A BUNCH HEADER STUFF HERE ....
# START TABLE HERE
<table border="0" width="100%" cellpadding="5" cellspacing="0">
<tr class="dbluetoppedbox" bgcolor="#E6EFF8"><td valign="TOP">
<span class="main">Vacancy</span>
</td><td valign="TOP"><span class="main">Employer</span>
</td><td valign="TOP" nowrap><span class="main">
Where (Ad posted)</span></td>
<td valign="TOP"><span class="main">Duration</span></td>
<td valign="TOP" nowrap><span class="main">Pay</span></td>
</tr>
<!-- START GRABBING RESULT HERE -->
<tr class="main"><td valign="TOP"><a href="[jobs3.cfm?v=18327933]">
[Camera Operator/ Video Editor]</a></td><td valign="TOP">[BigbreakNy]</td>
<td valign="TOP">[Manhattan and Union ]([30 Aug ])</td>
<td valign="TOP">[ASAP / A few days of shooting]</td><td valign="TOP">[Lo/no]</td>
</tr>
# NEXT ROW CELL
<tr class="main"><td valign="TOP"><a href="[jobs3.cfm?v=18326674]">
[Video Sub]</a></td><td valign="TOP">[Blue Man Group]</td><td valign="TOP">[New York (30 Aug)]
</td><td valign="TOP">[ASAP / open ended]</td><td valign="TOP">[Paid]</td></tr>
# NEXT ROW CELL
.......
<!-- END RESULT GRABBING HERE -->
</table>
Mike(mickalo)Blezien
===============================
Thunder Rain Internet Publishing
Providing Internet Solution that Work
http://www.thunder-rain.com
===============================
.
- Follow-Ups:
- RE: Need some help filtering thru results
- From: Charles K. Clarkson
- RE: Need some help filtering thru results
- Prev by Date: Re: Perl and CGI
- Next by Date: general subroutine question
- Previous by thread: Help determining where URL is being redirected to
- Next by thread: RE: Need some help filtering thru results
- Index(es):