Re: Possible bug in HTML::Parser
- From: Bart Lateur <bart.lateur@xxxxxxxxxx>
- Date: Wed, 16 Nov 2005 08:37:40 GMT
Mark wrote:
><DT><A HREF="http://www.google.com" ADD_DATE="1101144594"
>ID="rdf:#$.GjDP">Google (search engine)</A>
>
>The decoded text passed to the handler by HTML::Parser
>would be "Google (search engine".
I've tried it with HTML::TokeParser::Simple, which is built on top of
HTML::Parser, and it comes out well:
$html = << '--';
<DT><A HREF="http://www.google.com" ADD_DATE="1101144594"
ID="rdf:#$.GjDP">Google (search engine)</A>
--
use HTML::TokeParser::Simple;
my $p = HTML::TokeParser::Simple->new( \$html );
while ( my $token = $p->get_token ) {
print $token->as_is;
}
This prints:
<DT>
<A HREF="http://www.google.com" ADD_DATE="1101144594"
ID="rdf:#$.GjDP">
Google (search engine)
</A>
>Any ideas whether this is a bug in HTML::Parser, or should I
>take another look at my code?
My guess is that you only get part of the text, and you have to be
patient, because there is no garantee at all that all of the text will
come out in one chunk. So probably next time the text handler gets
called, the rest will come out... at least, part of it.
--
Bart.
.
- Follow-Ups:
- Re: Possible bug in HTML::Parser
- From: Mark
- Re: Possible bug in HTML::Parser
- References:
- Possible bug in HTML::Parser
- From: Mark
- Possible bug in HTML::Parser
- Prev by Date: Re: XML::Parser examples for the novice
- Next by Date: Mod_perl conflict with PHP on HTTP_REFERER variable
- Previous by thread: Possible bug in HTML::Parser
- Next by thread: Re: Possible bug in HTML::Parser
- Index(es):
Relevant Pages
|
|