Re: did I get greedy quantifiers wrong ?



Thanks a lot Paul ..

For this rule :
$str = mississippi;
$str =~ m/m(.*i)(.*pi)/;

My initial understanding was that .*i would match all the way till last char i.
This would indeed be true if .*i was not followed by .*pi.
Do you agree ?


On 31 May 2007 06:11:45 -0700, Paul Lalli <mritty@xxxxxxxxx> wrote:
On May 31, 6:02 am, sharan.basa...@xxxxxxxxx (Sharan Basappa) wrote:
> I seem to be having some conceptual problem with greedy quantifiers ..
> My understanding is that it matches as much as follows while still
> allowing rest of the
> regex to match.

90% correct. The other 10% is that the match starts left-to-right.
It will start with the first part of the string that can match, and
match as much of *that* as possible. It will not search the rest of
the string to see if a longer match is possible later. For example:

$string = 'abbabbbbba';
$string =~ /(b*)/;
In this case, $1 will be set to 'bb', because that is the *first*
longest string it could find, even though if it had continued, it
would have been able to find 'bbbbb' later.


> But look at the following example :
> $str = mississippi;
> $str =~ m/m(.*i)(.*pi)/;
> print "one is $1 \n";
> print "two is $2 \n";
>
> $str = mississippi;
> $str =~ m/m(.*i?)(.*pi)/;

This doesn't mean what you think it means. This tells Perl that the
second token - (.*i?) can match as much of anything as it can,
followed by 0 or 1 i's. That ? does not apply to the .* unless you
put it right after the *. Compare and contrast with:
(.*?i), which means to match as little of anything as possible,
followed by exactly one i.

> print "one is $1 \n";
> print "two is $2 \n";
>
> In the first code snippet, I expected first regex (.*i) to match till
> ississip

Right there is a problem. Your token is (.*i). That is, the last
character of this token must be an i. It can't end with a p. That
doesn't match. The .* matches as much as it can until the last 'i',
then saves the 'i' for the i in the token.

> and leave pi for (.*pi) regex.
>
> But what I get as the output of this script is :
>
> one is ississi
> two is ppi
> one is ississip
> two is pi
>
> Why is that perl is leaving ppi to second regex while it can continue
> till first p

It can't. The token ends in an i. 'i' must be the last thing that
(.*i) matches.

Paul Lalli


--
To unsubscribe, e-mail: beginners-unsubscribe@xxxxxxxx
For additional commands, e-mail: beginners-help@xxxxxxxx
http://learn.perl.org/



.



Relevant Pages

  • Re: Find & Replace in Multi-Megabyte Strings
    ... REGEX Avery should be your name! ... Following example takes 20 seconds for the 6 replacements. ... Dim str$, arrPat, arrRep ... > String processing in VBA is very slow when strings are large (1-500 ...
    (microsoft.public.excel.programming)
  • Re: two regexs
    ... I would like to match each value from @str with only one regex. ... informally-specified bug-ridden slow implementation of half of Common Lisp. ...
    (comp.lang.perl.misc)
  • Re: two regexs
    ... gbacon@hiwaay.net (Greg Bacon) wrote: ... I would like to match each value from @str with only one regex. ...
    (comp.lang.perl.misc)
  • Re: Can I find a size of font in the current element?
    ... > Should I just split the strVal by RegEx and convert to Number? ... str = new String; ... I don't even think you need the explicit conversion. ... Prev by Date: ...
    (comp.lang.javascript)
  • Re: did I get greedy quantifiers wrong ?
    ... $str = mississippi; ... print $1; # Should output mississippi ... Since this is a greedy RegEx, ...
    (perl.beginners)