Re: Complex regex help



Omega -1911 wrote:

Hi Rob & Dani,

Thanks for your help!!! I will try the suggestion you made Rob and as soon
as I finish typing this, I'll try Dani's code. I had someone by the name of
Chen Ken contact me off-list and provided me with the following regex that
appeared to work. Please let me know what you think:

my( $title, $event) = $data_string =~
m|([^>]*)(?:</FONT></b>)([^\]]*)([^<]*)|;

Hello Dave

You will need help to use HTML::TreeBuilder as it's fairly complex, and to help
you we need fuller information on the HTML you're processing. Can you publish a
bigger chunk? Or, better still, the URL where it is coming from?

The regex doesn't look right at all, the (?: .. ) around the closing font and
bold tags has no effect, and the ] in the character class needn't be escaped.
Apart from that it will grab everything from EVENT up to the end of the Ref #
value into $event and the closing ] into $3 which is then discarded. Not good at
all.

Against my better judgement I could offer

my @stuff = $data =~ />\s*([^<>]+)\s*</g;

which will return all the text between the HTML tags, but this will fall down if
you have something like <i>...</i> in the middle of one of the fields, which
will result in the text being broken into multiple segments. Better all round to
use a proper parser.

HTH,

Rob

.



Relevant Pages

  • a regex question (sample code with comments provided)
    ... I've been trying to parse blurbs of text formatted with HTML tags and ... this in PHP. ... Perl's regex are more powerful. ...
    (php.general)
  • Re: need help with re module
    ... > extract all the text, without html tags, the result should be some ... > thing like that: helloworldok ... what would be the correct regex to use? ... I don't see why a regex ...
    (comp.lang.python)
  • Re: Regular Expression help
    ... A regex remembers everything it matches -- no need to wrap the entire ... If the link text itself contains html tags, ... If you're just going to run this regex repeatedly on an html doc and make ...
    (comp.lang.python)
  • Re: Replacing html tags
    ... I'm not all that bad at Regex, but i'm stumped on how to approach my ... I need to parse a string and remove all html tags except hyperlinks. ...
    (microsoft.public.dotnet.framework.aspnet)