Trying to craft a regexp

From: Manuel VáZquez Acosta (manu_at_chasqui.cu)
Date: 10/30/03


To: php-general@lists.php.net
Date: Wed, 29 Oct 2003 22:07:59 -0500

Hi all:

I'm trying to find every simple mail address in an HTML that is not inside
an A tag.

I have tried this regexp:
(?<!maito\:)(\w+@\w+(?:\.\w+)+)(?![^<]*?</a>)

But its not working as I expect cause the only address in my tested HTML is:

<a href=mailto:manu_like@yahoo.com class="link-home">My address</a>

Any tips?
Manu.



Relevant Pages

  • Re: removing Whitespace using regexp
    ... html and then write a parser to parse the properly formatted html. ... That way you can get rid of your whitespace problem and deal with the cosmos ... the remaining text using regexp. ... Here you can see some white space ...
    (comp.lang.ruby)
  • Re: HTML scraping
    ... I've read the "Writing HTML parser wasn't as hard as I thought it'd be" ... regexp and the full DOM monster. ... you can still infer the semantics from the physical ...
    (comp.lang.lisp)
  • Re: Reducing RegEx (pcre)
    ... > Jenda Krynicky wrote: ... were sanitizing some HTML you got from outside (or even from the ... some other tag it will not match, but the regexp would be insane. ...
    (perl.beginners)
  • regexp and stack overflow
    ... it works well for some html file but crash over other with the following ... RegexpError: Stack overflow in regexp matcher: ... strip out all the contents of scripts, all the html tags with their ... the prog failes for a file having the following parts for script: ...
    (comp.lang.ruby)
  • Re: What kind of tcl tools would help me parse and use html info?
    ... fetch an html http URL ... There's quite a few things wrong with the above regexp, ... The htmlparser module in tcllib and tDOM's html parser do a reasonably ... good job on tag soup, ...
    (comp.lang.tcl)