Re: Question about perlreref - are {n} and {n}? different?



Dr.Ruud wrote:
usenet@xxxxxxxxxxxxxxx schreef:

perlreref::QUANTIFIERS says:

Quantifiers are greedy by default -- match the longest leftmost.
Maximal Minimal Allowed range
------- ------- -------------
{n,m}   {n,m}?  Must occur at least n times but no more than m times

The 'Must occur ... no more than m times' is not accurate.

It is accurate, when you realize that it is talking about the characters that were actually part of the matched string. Characters outside the matched string are irrelevant when the match succeeds.

#!/usr/bin/perl -w
use strict;

  my $s = 'a'x100; # is more than 50 times

  sub run {
    local ($,, $\) = (' ', "\n");
    my $re; ($re, $_) = @_;
    s/$re/$1/;
    print length, length($1);
  }

  run 'a{10,50}?(.*)'  , $s;
  run 'a{10,50}?(.*?)a', $s;
  run 'a{10,50}?(.*?)' , $s;
  run 'a{10,50}(.*?)'  , $s;
  run 'a{10,50}(.*)'   , $s;

output:
90 90
89 0
90 0
50 0
50 50

run 'a{10,50}?(.*)' , $s; First part matches minimum; 'a'x10. Second part matches rest of string; 'a'x90. Replacing first+second with just second = 'a'x90. Expected result of "90 90" = yes.

run 'a{10,50}?(.*?)a', $s;
  First part matches minimum; 'a'x10.
  Second part matches the null string.
  Third part matches 11th a.
  Replacing first+second+third with just second leaves the
  89 characters that were not part of the overall match = 'a'x89.
  Expected result of "89 0" = yes.

run 'a{10,50}?(.*?)' , $s;
  First part matches minimum; 'a'x10.
  Second part matches the null string.
  Replacing first+second with just second leaves the
  90 characters that were not part of the overall match = 'a'x90.
  Expected result of "90 0" = yes.

run 'a{10,50}(.*?)'  , $s;
  First part matches maximum; 'a'x50.
  Second part matches the null string.
  Replacing first+second with just second leaves the
  50 characters that were not part of the overall match = 'a'x50.
  Expected result of "50 0" = yes.

run 'a{10,50}(.*)'   , $s;
  First part matches maximum; 'a'x50.
  Second part matches rest of string; 'a'x50.
  Replacing first+second with just second = 'a'x50.
  Expected result of "50 50" = yes.

The s/$re/$1/ just confuses things.  This is better:

#!/usr/bin/perl -w
use strict;

  my $s = 'a'x100; # is more than 50 times

  sub run {
    my $re; ($re, $_) = @_;
    /$re/;
    print "\$1='$1' \$2='$2' rest=|$'|\n";
  }

run '(a{10,50}?)(.*)' , $s;
run '(a{10,50}?)(.*?)a', $s;
run '(a{10,50}?)(.*?)' , $s;
run '(a{10,50})(.*?)' , $s;
run '(a{10,50})(.*)' , $s;
$1='aaaaaaaaaa' $2='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' rest=||
$1='aaaaaaaaaa' $2='' rest=|aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa|
$1='aaaaaaaaaa' $2='' rest=|aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa|
$1='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' $2='' rest=|aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa|
$1='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' $2='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' rest=||


This shows that /a{10,50}?/ matches the first 10 characters of the
string and /a{10,50}/ matches the first 50 characters of the string.

	-Joe
.



Relevant Pages

  • Re: How to convert Infix notation to postfix notation
    ... If this is for an error message, why isn't it using stderr for its output? ... array of 15 characters, and you call this function with the limit 15 on ... Making sure that the only string I allocate and append to, ... because mulFactor in all versions must needs incorporate the functions ...
    (comp.lang.c)
  • Re: Prothon should not borrow Python strings!
    ... """It does not make sense to have a string without knowing what encoding ... same cul de sac as Python. ... Prothon_String_As_ASCII // raises error if there are high characters ... Python's split between byte strings and Unicode strings is ...
    (comp.lang.python)
  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... put them in stupid places. ... Programming is difficult (as you must surely appreciate, ... > strings will be in the range 1...1000 characters. ... impose an artificially small limit on string length." ...
    (comp.programming)
  • Re: Byte Array to String
    ... retrieved text will mismatch the original characters. ... encoding the characters. ... Dim strFileData as String ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: A note on personal corruption as a result of using C
    ... impossible to write effective string validation routines by definition ... (Note that a string literal may contain embedded null characters; ... without resorting to abusive language. ... In practice, programmers typically use "struct" ...
    (comp.programming)