Wild card link matching



Hi everyone,

I need to write a regex that parses some HTML text to output all links
whose text (the text that appears on the screen) a given expression.

eg: findLinks(html,'(.*)o(.*)') called on the html code

<a>one</a>
<a>three</a>
<a>two</a>

Should return two matches, <a>one</a> and <a>two</a>

I'm a bit new with regexs. At the moment I have:

'/<a[^><]*href\s*=\s*[^>]*>'.$regex.'<\/a>/'

(I'm only interested with tags that have a href attribute)

which greedily matches the entire input string.

How do I make the </a> match non greedy? I've read that (.*?)<\/a>
makes the match non greedy, but this doesn't account for the form of
the link text.

Thanks

Taras

.



Relevant Pages