Re: Regexp to match an URL in an HTML <a href=""></a> tag

From: Andy R (andrew_rowland_at_hotmail.com)
Date: 11/15/03

  • Next message: Tom McCarthy: "Hexadecimal date format"
    Date: Sat, 15 Nov 2003 11:58:12 GMT
    
    

    "Charles Nadeau" <charlesnadeau@hotmail.com> wrote in message
    news:bp483h$1gv0$1@nwall1.odn.ne.jp...
    > Hello,
    >
    > I am trying to craft a regular expression to filter an URL from a <a
    > href=""></a> tag and the one I have doesn't seen right.
    > I use the regular expression from this snippet of code:
    >
    > foreach my $message (@messages)
    > {
    > my @match=($message->decoded=~/\bhref="(http.*)">.*/gi);
    >
    > foreach my $match(@match)
    > {
    > print $match,"\n";
    > }
    >
    > }
    >
    > but it doesn't lead to results that are exactly what I need. An excerpt of
    > what I get as an output looks like:
    >
    > http://2%30%33.197.%3204.1%355/mout/
    > http://www.superrxsalesman.info/aff1/?mulish
    > http://www.superrxsalesman.info/aff1/?acme
    > http://www.superrxsalesman.info/aff1/?blister
    > http://www.superrxsalesman.info/aff1/?samba
    > http://www.superrxsalesman.info/aff1/?depot"><font color="#0033CC
    > http://www.superrxsalesman.info/aff1/?procter"><font color="#0033CC
    > http://www.superrxsalesman.info/aff1/?use"><font color="#0033CC
    > http://www.superrxsalesman.info/aff1/?butane"><font color="#0033CC
    > http://www.superrxsalesman.info/aff1/?fiche"><font color="#0033CC
    >
    > The first 5 lines are exactly what I want but I don't understand why in
    the
    > following lines I get characters after and including ". I want basically
    to
    > keep what is in between the "" of the <href=""> tag.
    > Could anybody tell me what is wrong with my regular expression?
    > Thanks!
    >
    > Charles
    >
    > --
    > Charles-E. Nadeau Ph.D
    > http://radio.weblogs.com/0111823/

    Use a ? to perform a non-greedy match ie:

    my @match=($message->decoded=~/\bhref="(http.*?)">.*/gi);

    Should work, though I've not tested it.

    Andy R


  • Next message: Tom McCarthy: "Hexadecimal date format"

    Relevant Pages

    • Re: Regexp to match an URL in an HTML <a href=""></a> tag
      ... > I am trying to craft a regular expression to filter an URL from a ... > the regular expression from this snippet of code: ...
      (comp.lang.perl)
    • Regexp to match an URL in an HTML <a href=""></a> tag
      ... I am trying to craft a regular expression to filter an URL from a <a ... I use the regular expression from this snippet of code: ... foreach my $message ... Charles ...
      (comp.lang.perl)
    • Re: Everybody is out of step (including the Looney Zealot) but...
      ... Has anybody come up with a filter string for Agent that will reliably ... ".*" means match zero or more characters before the next character. ... If she tries some BS like cArOlInE, this very paranoid regular expression will ... What I REALLY wish is that Agent included some sort of semantic analysis, ...
      (rec.outdoors.rv-travel)
    • Re: Yipee! .. theyre gone ...
      ... on LinuxMINT), I'm using a filter which successfully deleted all the ... In case anyone is interested in trying the same, the filter uses ... Regular Expression, and is given below. ... header matches any message crossposted to 4 or more ...
      (uk.people.silversurfers)
    • Re: 40tude Dialog (regex query): How to test for no vowels in Subject?
      ... the above filter for scoring but it didn't work. ... So I verified that the regular expression did indeed work. ... Do I need to use regular expressions in the group specifier, too, like ... how come the asterisk works by itself instead of having to use ...
      (news.software.readers)