Re: how to capture multiple lines?
From: Tassilo v. Parseval (tassilo.parseval_at_rwth-aachen.de)
Date: 03/29/04
- Next message: lvirden_at_yahoo.com: "Re: Loss of privledges in a perl app"
- Previous message: Anno Siegel: "Re: how to capture multiple lines?"
- In reply to: Geoff Cox: "Re: how to capture multiple lines?"
- Next in thread: Gunnar Hjalmarsson: "Re: how to capture multiple lines?"
- Reply: Gunnar Hjalmarsson: "Re: how to capture multiple lines?"
- Reply: Geoff Cox: "Re: how to capture multiple lines?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 29 Mar 2004 13:02:51 GMT
Also sprach Geoff Cox:
> On Mon, 29 Mar 2004 13:53:51 +0200, Gunnar Hjalmarsson
><noreply@gunnar.cc> wrote:
>
>
>>That does not set it to default. This does:
>>
>> $/ = "\n";
>
> The best I can get is as follows
>
> sub para {
>
> local ($/ = "\0a\0d");
>
> my ($linepara) = @_;
> $linepara =~ /<p>(.*?)<\/p>/s;
> # print ("\$1 = $1 \n");
> print OUT ("<tr><td colspan=2>" . $1 . "<\/td><\/tr> \n");
> $/ = "";
> }
>
> Now, this does get the
><p> jahjsdkaljk al
> asdjk aksdj klad
> kajsd akl </p>
>
> text but it also get some lines which I do not want and do not get if
> I do not use $/ - so am a bit lost. Tempted to put the whol code up
> but that would be asking too much!
>
> I would liek to use the slurp approach but not sure how to do it so
> that as I parse through an html file and find the first line of the
> first <p> etc block of text - how do I get that text and put in into a
> file and then when find the second <p> block put it in the right
> place...I do not want toput all the <p> etc text together..they appear
> at different places in the html file....
If I understand you right, you want to grab everything that appears in
<p> tags? Here's an example using HTML::Parser:
#! /usr/bin/perl -w
package MyParser;
use strict;
use base qw/HTML::Parser/;
our $in_para;
sub start {
my (undef, $tagname) = @_;
$in_para = 1 if $tagname eq 'p';
}
sub end {
my (undef, $tagname) = @_;
$in_para = 0 if $tagname eq 'p';
}
sub text {
my (undef, $text) = @_;
print $text if $in_para;
}
package main;
my $p = MyParser->new;
$p->parse_file("file.html");
It's dead simple: You create a subclass of HTML::Parser (MyParser) that
overwrites the start(), end() and text() method. The start() method
simply sets the global variable $in_para to a true value when it
encountered a <p>-starttag. It's set to false when </p> is encountered.
The method text() is triggered for ordinary text. It will only print it
when $in_para is true.
This solution is very robust and since the basic skeleton is only a few
lines, it is easily extensible. You most probably want to change the
text() method to let it print into a file or so. If you want to grab
anything between <p> and </p> (including other tags) you must extend
start() and end() a bit to print their last argument (which is the
original text of the tag as it appeared in the HTML-file). Something
like:
sub start {
my (undef, $tagname, undef, undef, $origtext) = @_;
print $origtext if $in_para;
$in_para = 1 if $tagname eq 'p';
}
sub end {
my (undef, $tagname, $origtext) = @_;
$in_para = 0 if $tagname eq 'p';
print $origtext if $in_para;
}
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
- Next message: lvirden_at_yahoo.com: "Re: Loss of privledges in a perl app"
- Previous message: Anno Siegel: "Re: how to capture multiple lines?"
- In reply to: Geoff Cox: "Re: how to capture multiple lines?"
- Next in thread: Gunnar Hjalmarsson: "Re: how to capture multiple lines?"
- Reply: Gunnar Hjalmarsson: "Re: how to capture multiple lines?"
- Reply: Geoff Cox: "Re: how to capture multiple lines?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|