Re: Possible bug in HTML::Parser



Mark wrote:

><DT><A HREF="http://www.google.com"; ADD_DATE="1101144594"
>ID="rdf:#$.GjDP">Google (search engine)</A>
>
>The decoded text passed to the handler by HTML::Parser
>would be "Google (search engine".

I've tried it with HTML::TokeParser::Simple, which is built on top of
HTML::Parser, and it comes out well:

$html = << '--';
<DT><A HREF="http://www.google.com"; ADD_DATE="1101144594"
ID="rdf:#$.GjDP">Google (search engine)</A>
--
use HTML::TokeParser::Simple;
my $p = HTML::TokeParser::Simple->new( \$html );

while ( my $token = $p->get_token ) {
print $token->as_is;
}

This prints:

<DT>
<A HREF="http://www.google.com"; ADD_DATE="1101144594"
ID="rdf:#$.GjDP">
Google (search engine)
</A>

>Any ideas whether this is a bug in HTML::Parser, or should I
>take another look at my code?

My guess is that you only get part of the text, and you have to be
patient, because there is no garantee at all that all of the text will
come out in one chunk. So probably next time the text handler gets
called, the rest will come out... at least, part of it.

--
Bart.
.



Relevant Pages

  • Re: Jan: Do you see yourself?
    ... And who is the handler, ... >> people, or more, with my name in areas of far lower population than ... And use Google I recently did - on your advice on unrelated ... there is ONLY one Mark S. Probert in NY. ...
    (misc.health.alternative)
  • Re: Jan: Do you see yourself?
    ... And who is the handler, ... use Google. ... Give it up, Mark. ... That makes claim # two for Rich to prove ...
    (misc.health.alternative)
  • Re: Jan: Do you see yourself?
    ... And who is the handler, ... use Google. ... Give it up, Mark. ... Since Rich is not aware of my family history, he could not have known what I posted. ...
    (misc.health.alternative)
  • Re: Jan: Do you see yourself?
    ... And who is the handler, ... use Google. ... Give it up, Mark. ... surname Ap Robert, ...
    (misc.health.alternative)
  • Re: Jan: Do you see yourself?
    ... Fantasies? ... And who is the handler, ... use Google. ... Give it up, Mark. ...
    (misc.health.alternative)