HTML::TokeParser and Matching




I have a webpage where I need to look for the string 'Matches: n' (n is any number). The html is not that well structured, so I am having some difficulties parsing the right part. If the match is not zero, I want to grab the remaining text. (html code below)


use strict;
use LWP::Simple;
use HTML::TokeParser;
use Data::Dumper;

# First - LWP::Simple. Download the page using get();.

my $content = get( "http://www.somewebpage.com/id=396"; ) or die $!;
my $stream = HTML::TokeParser->new( \$content ) or die $!;

my ($tag, $headline, $url,$p);

while (my $p = $stream->get_tag("td")) {
my $text = $stream->get_trimmed_text("/td");
if ($text =~ /Matches/)
{ print $text; }

}


____HTML___
<tr><td>Matches:</td><td>3</td></table>
<hr size=1>
<table width="100%" cellpadding=0 cellspacing=0>
<tr><td nowrap><br>
I want to GRAB THIS PART <br>
<img hspace=2 src="/gif/dot.gif" alt=" "><A HREF="http://www.somepage.com";>Link1</A><br>
</td><td nowrap><br><br> 01/31/2007<br></td>
<tr>

<td nowrap><br>
I want to GRAB THIS ALSO<br>
<img hspace=2 src="/gif/dot.gif" alt=" ">I want to GRAB THIS ALSO<br>
<img hspace=4 src="/gif/dot.gif" alt=" ">I want to GRAB THIS ALSO<br>
<img hspace=6 src="/gif/dot.gif" alt=" "><A HREF="www.www.com">Link2</A><br>
</td><td nowrap><br><br><br><br> 02/01/2007+<br></td>
<tr>
<td nowrap><br>
.......

</table>
.



Relevant Pages

  • case solved
    ... if-statement above this and that if statement wasn't true... ... I'm trying to search a string to determine if the string contains ... WebRequest/WebReponse to retrieve the html from a webpage and now I'm ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Trouble with strings..
    ... Chris Rio wrote: ... This one is meant to head off to a webpage and grab a big ... All of the methods I have used thus far end up parsing that string, ...
    (comp.lang.ruby)
  • Re: problem with string.Contains()
    ... or you could use regular expressions. ... I'm trying to search a string to determine if the string contains, ... retrieve the html from a webpage and now I'm searching through this html... ...
    (microsoft.public.dotnet.languages.csharp)
  • preg_match_all: looking for the right pattern desperately :-(
    ... I need to grab a webpage that looks like this: ... I need to distinguish this string: ... I tried this pattern: ...
    (comp.lang.php)
  • Re: hpricot search help
    ... I am trying to grab the table that has a html comment in it. ... string is really malformed html. ...
    (comp.lang.ruby)