Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ?



Gerwin <g.h.vandoorn@xxxxxxxxx> wrote in comp.lang.perl.misc:
On Jan 31, 12:01 pm, "Gerwin" <g.h.vando...@xxxxxxxxx> wrote:
Hi,

I'm using HTML::Parser to strip HTML tags from my files. I noticed
how //<![cdata[ ... //]]> and the javascript between that is not
stripped. Any idea how to do this?

-Gerwin

Well i made a regex to do it:

$content =~ s/(\/\/<!\[.*\/\/]]>)//;

Is this efficient? If not, what is?

Why do you think efficiency matters?

At this point you should be concerned with effectiveness: Does it
match what it is supposed to match, no more and no less? Since I
don't know the variability of the pattern I can't tell. The fact
you are matching only one opening "[" but two closing "]" is a bit
dubious. Shouldn't the string "cdata" be checked somewhere?

Worry about efficiency when your program turns out to be slow.
If that happens, I dare say it won't be this regex that is
responsible.

Anno
.



Relevant Pages

  • Re: HTML:Parser how to remove "//" ?
    ... I'm using HTML::Parser to strip HTML tags from my files. ... Worry about efficiency when your program turns out to be slow. ... I dare say it won't be this regex that is ...
    (comp.lang.perl.misc)
  • Re: [FYI] MSXML HTTP translates response status code 204 to 1223
    ... Don't you think regex is really a wrong tool for the job in this case? ... Modifying the property requires much less effort than modifying the method. ... the RegExp provides the user which much an ... Efficiency difference most definitely doesn't matter. ...
    (comp.lang.javascript)