Re: scalar / hash problem in HTML::Parser



In article <1204001360.5977.37.camel@edoras>, Tim Bowden
<tim.bowden@xxxxxxxxxxxxxx> wrote:

I need to find a way to get HTML::Parser return the text between the tag
caught by the start_h handler and the related closing tag. Could
someone please point me in the right direction?

Cut down code thus far:

#!/usr/bin/perl -wT
use strict;
use HTML::Parser;

my %choices;
my $file = 'test_snippet';
my $parser = HTML::Parser-> new(api_version => 3,
start_h => [\&start, "tagname, attr, "],
); # I think I need to add something after attr, to get
# what I want, but not sure what to add

sub start {
my ($tag, $attr, $tagged_text) = @_; # $tagged_text should get
# whatever we pass after attr in start_h
print "we got: $tag\t$attr\t$tagged_text\n";
for (keys %{$attr}){
my $value = (${$attr}{$_});
# do something with $tagged_text if we had it
}
}
$parser->parse_file($file) or die "couldn't parse file";
## end

Define "text" and "end" handlers. In the text handler, save up the
provided text. Process the text in the end handler.

Here is a program that saves up the text for embedded tags:

#!/usr/local/bin/perl
use strict;
use warnings;
use HTML::Parser;

my( %choices, %text, $tag, @tags);
my $parser = HTML::Parser->new(
api_version => 3,
start_h => [\&start, "tagname"],
end_h => [\&end, "tagname"],
text_h => [\&text, "text"],
);

my $input = do { local $/; <DATA>};
print "Input:\n$input\n\n";

$parser->parse($input) or die "couldn't parse file";

sub start {
$tag = shift;
push(@tags,$tag);
print "Start tag <$tag>\n";
}

sub end {
$tag = shift;
print "End tag </$tag>, text=\"$text{$tag}\"\n";
$text{$tag} = '';
pop @tags;
$tag = $tags[-1];
}

sub text {
my( $piece ) = @_;
print "Text for <$tag>: \"$piece\"\n";
$text{$tag} .= $piece;
}

__DATA__
<html>
<body>
<t1>This is the text
enclosed by tag t1
<t2>This is tag t2 text</t2>
More tag t1 text.
</t1>
</body>
</html>

--
Jim Gibson

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
.



Relevant Pages

  • RXParse module v.90 (by robic0)
    ... When I release version 1 of RXParse, I anticipate that I will provide a ftp site ... sub original_content ... then call content handler with $content ... # call start tag handler with $2 ...
    (comp.lang.perl.misc)
  • CRAP CODE CHRONICLES: Xml
    ... this will be painfull for all the XML experts here on this board. ... sub original_content ... # call new_parse handler ... # call start tag handler with $2 ...
    (comp.lang.perl.misc)
  • RXParse module (by robic0), Version 0.1000
    ... sub original_content { ... then call content handler with $content ... # call start tag handler with $2 ... throwX('10', undef, undef, undef, undef); ...
    (comp.lang.perl.misc)
  • RXParse 1.2
    ... this will not be the focus for RXParse. ... Version 1.4 will contain the XP3 engine code to do inline replacement within the SAX handler. ... sub original_content ... # call start tag handler with $2 ...
    (comp.lang.perl.misc)
  • reuse code inquiry
    ... I am a perl beginner and I am suggested to parse HTML by using ... sub parse_html { ... # incomplete tag. ... if ($routine eq "") { ...
    (comp.lang.perl.misc)