Re: scalar / hash problem in HTML::Parser
- From: Jim Gibson <jimsgibson@xxxxxxxxx>
- Date: Tue, 26 Feb 2008 10:42:22 -0800
In article <1204001360.5977.37.camel@edoras>, Tim Bowden
<tim.bowden@xxxxxxxxxxxxxx> wrote:
I need to find a way to get HTML::Parser return the text between the tag
caught by the start_h handler and the related closing tag. Could
someone please point me in the right direction?
Cut down code thus far:
#!/usr/bin/perl -wT
use strict;
use HTML::Parser;
my %choices;
my $file = 'test_snippet';
my $parser = HTML::Parser-> new(api_version => 3,
start_h => [\&start, "tagname, attr, "],
); # I think I need to add something after attr, to get
# what I want, but not sure what to add
sub start {
my ($tag, $attr, $tagged_text) = @_; # $tagged_text should get
# whatever we pass after attr in start_h
print "we got: $tag\t$attr\t$tagged_text\n";
for (keys %{$attr}){
my $value = (${$attr}{$_});
# do something with $tagged_text if we had it
}
}
$parser->parse_file($file) or die "couldn't parse file";
## end
Define "text" and "end" handlers. In the text handler, save up the
provided text. Process the text in the end handler.
Here is a program that saves up the text for embedded tags:
#!/usr/local/bin/perl
use strict;
use warnings;
use HTML::Parser;
my( %choices, %text, $tag, @tags);
my $parser = HTML::Parser->new(
api_version => 3,
start_h => [\&start, "tagname"],
end_h => [\&end, "tagname"],
text_h => [\&text, "text"],
);
my $input = do { local $/; <DATA>};
print "Input:\n$input\n\n";
$parser->parse($input) or die "couldn't parse file";
sub start {
$tag = shift;
push(@tags,$tag);
print "Start tag <$tag>\n";
}
sub end {
$tag = shift;
print "End tag </$tag>, text=\"$text{$tag}\"\n";
$text{$tag} = '';
pop @tags;
$tag = $tags[-1];
}
sub text {
my( $piece ) = @_;
print "Text for <$tag>: \"$piece\"\n";
$text{$tag} .= $piece;
}
__DATA__
<html>
<body>
<t1>This is the text
enclosed by tag t1
<t2>This is tag t2 text</t2>
More tag t1 text.
</t1>
</body>
</html>
--
Jim Gibson
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
.
- References:
- scalar / hash problem in HTML::Parser
- From: Tim Bowden
- Re: scalar / hash problem in HTML::Parser
- From: Tim Bowden
- scalar / hash problem in HTML::Parser
- Prev by Date: looping thru delete check boxes
- Next by Date: Re: Are comments allowed before package declarations in modules?
- Previous by thread: Re: scalar / hash problem in HTML::Parser
- Next by thread: Hash Sorting Problem
- Index(es):
Relevant Pages
|