Re: "negative" regexp
- From: Abigail <abigail@xxxxxxxxxx>
- Date: 30 Jan 2008 02:32:46 GMT
_
Petr Vileta (stoupa@xxxxxxxxxxxxx) wrote on VCCLXV September MCMXCIII in
<URL:news:fnoj8a$1hin$1@xxxxxxxxxxxxxxx>:
;; I have problem to construct regexp, this is out of my brain ;-) please help.
;;
;; I have string
;;
;; $string="<a href='big.gif'><img src='small.gif'></a><br><a
;; href='abc.htm'><b>click</b></a>";
;;
;; and I need to remove all html tags except <img ...>. The result should be
;;
;; $string="<img src='small.gif'>click";
;;
;; Now I do it this way
;;
;; # replace < with Ctrl-B and > with Ctrl-E for all <img> tags
;; $string=~s/<img\s+(.+?)>/\cb$1\ce/g;
;; # remove all html tags
;; $string=~s/<.+?>//g;
;; # replace back all Ctrl-B with <
;; $string=~s/\cb/</g;
;;
;; # replace back all Ctrl-E with >
;; $string=~s/\ce/>/g;
;;
;; but maybe exist another way.
Well, for your example,
s/<(?!img)[^>]*>//g
ought to do it (untested).
But that assumes no '>' is present inside a tag, which doesn't have
to be the case.
The "right" way to do it is to use a proper HTML parser.
Get one from your nearest CPAN.
Abigail
--
perl -swleprint -- -_=Just\ another\ Perl\ Hacker
.
- Follow-Ups:
- Re: "negative" regexp
- From: Petr Vileta
- Re: "negative" regexp
- References:
- "negative" regexp
- From: Petr Vileta
- "negative" regexp
- Prev by Date: Re: List of directories within a directory
- Next by Date: Re: List of directories within a directory
- Previous by thread: "negative" regexp
- Next by thread: Re: "negative" regexp
- Index(es):
Relevant Pages
|
|