Re: "negative" regexp



_
Petr Vileta (stoupa@xxxxxxxxxxxxx) wrote on VCCLXV September MCMXCIII in
<URL:news:fnoj8a$1hin$1@xxxxxxxxxxxxxxx>:
;; I have problem to construct regexp, this is out of my brain ;-) please help.
;;
;; I have string
;;
;; $string="<a href='big.gif'><img src='small.gif'></a><br><a
;; href='abc.htm'><b>click</b></a>";
;;
;; and I need to remove all html tags except <img ...>. The result should be
;;
;; $string="<img src='small.gif'>click";
;;
;; Now I do it this way
;;
;; # replace < with Ctrl-B and > with Ctrl-E for all <img> tags
;; $string=~s/<img\s+(.+?)>/\cb$1\ce/g;
;; # remove all html tags
;; $string=~s/<.+?>//g;
;; # replace back all Ctrl-B with <
;; $string=~s/\cb/</g;
;;
;; # replace back all Ctrl-E with >
;; $string=~s/\ce/>/g;
;;
;; but maybe exist another way.


Well, for your example,

s/<(?!img)[^>]*>//g

ought to do it (untested).

But that assumes no '>' is present inside a tag, which doesn't have
to be the case.

The "right" way to do it is to use a proper HTML parser.

Get one from your nearest CPAN.



Abigail
--
perl -swleprint -- -_=Just\ another\ Perl\ Hacker
.



Relevant Pages

  • Re: "negative" regexp
    ... Petr Vileta wrote on VCCLXV September MCMXCIII ... ;; I have string ...
    (comp.lang.perl.misc)
  • Re: Removing html tags from field
    ... You didn't happen to name the module KillHTML, ... >> Public Function KillHTML(sText As String, ... Dim iRight As Integer ... >>> Is there a way to remove html tags from a memo field? ...
    (microsoft.public.access.queries)
  • Re: Table in MySql database
    ... >can add html tags and so on and all works fine. ... the function that does the translation does not know you are ... Now if you remove the nl2br() function then it will prevent it happening ... Note that this assumes that the string '<table' will only occur when you ...
    (alt.php)
  • Re: Removing html tags from field
    ... Public Function KillHTML(sText As String) As String ... Dim iLeft As Integer ... Dim iRight As Integer ... > Is there a way to remove html tags from a memo field? ...
    (microsoft.public.access.queries)
  • Re: extract text from html
    ... if you mean your Goal is just simply removing the HTML tags from a string ... i made a function for this purpose with some Regex ... Private Function stripHTMLAs String ... > Note, this is a Windows App, and not a Web App. ...
    (microsoft.public.dotnet.languages.vb)