Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ?
- From: anno4000@xxxxxxxxxxxxxxxxxxxxxx
- Date: 31 Jan 2007 12:51:21 GMT
Gerwin <g.h.vandoorn@xxxxxxxxx> wrote in comp.lang.perl.misc:
On Jan 31, 12:01 pm, "Gerwin" <g.h.vando...@xxxxxxxxx> wrote:
Hi,
I'm using HTML::Parser to strip HTML tags from my files. I noticed
how //<![cdata[ ... //]]> and the javascript between that is not
stripped. Any idea how to do this?
-Gerwin
Well i made a regex to do it:
$content =~ s/(\/\/<!\[.*\/\/]]>)//;
Is this efficient? If not, what is?
Why do you think efficiency matters?
At this point you should be concerned with effectiveness: Does it
match what it is supposed to match, no more and no less? Since I
don't know the variability of the pattern I can't tell. The fact
you are matching only one opening "[" but two closing "]" is a bit
dubious. Shouldn't the string "cdata" be checked somewhere?
Worry about efficiency when your program turns out to be slow.
If that happens, I dare say it won't be this regex that is
responsible.
Anno
.
- References:
- HTML:Parser how to remove "//<![CDATA[ ... //]]>" ?
- From: Gerwin
- Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ?
- From: Gerwin
- HTML:Parser how to remove "//<![CDATA[ ... //]]>" ?
- Prev by Date: Re: How to start some file as Perl script argument?
- Next by Date: FAQ 3.24 Why don't Perl one-liners work on my DOS/Mac/VMS system?
- Previous by thread: Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ?
- Next by thread: FAQ 3.24 Why don't Perl one-liners work on my DOS/Mac/VMS system?
- Index(es):
Relevant Pages
|