Re: Regexp: Lazy match workaround?

From: R. Rajesh Jeba Anbiah (ng4rrjanbiah_at_rediffmail.com)
Date: 06/07/04


Date: 6 Jun 2004 21:04:54 -0700

nobull@mail.com wrote in message news:<4dafc536.0406050852.58675c7e@posting.google.com>...
> ng4rrjanbiah@rediffmail.com (R. Rajesh Jeba Anbiah) wrote in message news:<abc4d8b8.0406042324.2e93ecfc@posting.google.com>...
> > Brian McCauley <nobull@mail.com> wrote in message news:<u98yf3e0cx.fsf@wcl-l.bham.ac.uk>...
> > > consider
> > >
> > > ' A C !' =~ /(\w.*?)+.*!/;
> > >
> > > Here the repeated group matches only 'A'. It does not match the 'C'
> > > because the non-greedyness of the '*?' is more important than the
> > > greedyness of the '+'.
> >
> > Again, many thanks to all the experts. I understand what you mean,
> > for example in the following case:
> > Target string: XabcABCX
> > Regex Pattern: /X(abc)+X/i
> > Matches : XabcABCX, ABC
> > NOT: XabcABCX, abc, ABC
> > ^^^
> > Here, only the 'ABC' is get matched, but not the first 'abc'. This
> > behavior is indeed bit difficult to understand :-(
>
> Indeed it would be - but that it not what happens. Go back and
> re-read what Anno said.
>
> The repeated capturing subexpression /(abc)/i does indeed match and
> capture both 'abc' and then also 'ABC'. But upon completion of the
> pattern match the special variable $1 ( or the first element of the
> list context value of the m// operator ) will contain the _last_ thing
> to be captured (i.e. 'ABC').

    Indeed a nice explanation. Many thanks for all your comments and
help.

> The only way you could see that 'abc' had been captured would be to
> look at the value of $1 part way through the pattern match operation.
> This is where (?{}) would come in.

    Thanks for pointing out that. But it is not available in PCRE as
someone said. Thanks.