Re: did I get greedy quantifiers wrong ?



On May 31, 6:02 am, sharan.basa...@xxxxxxxxx (Sharan Basappa) wrote:
I seem to be having some conceptual problem with greedy quantifiers ..
My understanding is that it matches as much as follows while still
allowing rest of the
regex to match.

90% correct. The other 10% is that the match starts left-to-right.
It will start with the first part of the string that can match, and
match as much of *that* as possible. It will not search the rest of
the string to see if a longer match is possible later. For example:

$string = 'abbabbbbba';
$string =~ /(b*)/;
In this case, $1 will be set to 'bb', because that is the *first*
longest string it could find, even though if it had continued, it
would have been able to find 'bbbbb' later.


But look at the following example :
$str = mississippi;
$str =~ m/m(.*i)(.*pi)/;
print "one is $1 \n";
print "two is $2 \n";

$str = mississippi;
$str =~ m/m(.*i?)(.*pi)/;

This doesn't mean what you think it means. This tells Perl that the
second token - (.*i?) can match as much of anything as it can,
followed by 0 or 1 i's. That ? does not apply to the .* unless you
put it right after the *. Compare and contrast with:
(.*?i), which means to match as little of anything as possible,
followed by exactly one i.

print "one is $1 \n";
print "two is $2 \n";

In the first code snippet, I expected first regex (.*i) to match till
ississip

Right there is a problem. Your token is (.*i). That is, the last
character of this token must be an i. It can't end with a p. That
doesn't match. The .* matches as much as it can until the last 'i',
then saves the 'i' for the i in the token.

and leave pi for (.*pi) regex.

But what I get as the output of this script is :

one is ississi
two is ppi
one is ississip
two is pi

Why is that perl is leaving ppi to second regex while it can continue
till first p

It can't. The token ends in an i. 'i' must be the last thing that
(.*i) matches.

Paul Lalli

.



Relevant Pages

  • Re: Fastest way to search a string for the occurance of a word??
    ... but the OP's question was what's the "Fastest way to search a string ... in all the tests I did here, the Regex was by far superior. ... However, of course, if you've got new regular expressions all ... Sure - but just that extra Match object could be relevant if the search ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: regular expression help
    ... Basically because if you remove everything that is optional in the regex below you end up with an empty regex: ... So the regex engine will try to match on every character in the string: ... , comma doesn't match, but the nothingness in front of it does. ... A quote followed by any sequence of characters that is not a quote, ...
    (microsoft.public.dotnet.framework)
  • Re: Regex optimization
    ... I was hoping that someone with knowledge of the Regex engine could ... match per string for either Regex. ... reluctant modifier, may be slower .*?, +? ... Variable parts will try to capture as much as possible. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Regex Capture problem
    ... "learned" my regex using a freeware utility that had slightly different ... was trying to capture instead of. ... I have used Regex utilities before, so I understand the concepts of text ... Function RESub(str As String, SrchFor As String, ReplWith As String) As String ...
    (microsoft.public.excel.programming)
  • Re: Trim a multiple line message to a single line
    ... You can do this quite easily with either a regex or a simple function I'll try to demonstrate both: ... private string LayoutInput ... Could you send a sample file with two of these data blocks f what ...
    (microsoft.public.dotnet.languages.csharp)