FAQ 6.19 What good is "\G" in a regular expression?
- From: PerlFAQ Server <brian@xxxxxxxxxxxxxx>
- Date: Wed, 30 Aug 2006 12:03:02 -0700
This is an excerpt from the latest version perlfaq6.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .
--------------------------------------------------------------------
6.19: What good is "\G" in a regular expression?
You use the "\G" anchor to start the next match on the same string where
the last match left off. The regular expression engine cannot skip over
any characters to find the next match with this anchor, so "\G" is
similar to the beginning of string anchor, "^". The "\G" anchor is
typically used with the "g" flag. It uses the value of pos() as the
position to start the next match. As the match operator makes successive
matches, it updates pos() with the position of the next character past
the last match (or the first character of the next match, depending on
how you like to look at it). Each string has its own pos() value.
Suppose you want to match all of consective pairs of digits in a string
like "1122a44" and stop matching when you encounter non-digits. You want
to match 11 and 22 but the letter <a> shows up between 22 and 44 and you
want to stop at "a". Simply matching pairs of digits skips over the "a"
and still matches 44.
$_ = "1122a44";
my @pairs = m/(\d\d)/g; # qw( 11 22 44 )
If you use the \G anchor, you force the match after 22 to start with the
"a". The regular expression cannot match there since it does not find a
digit, so the next match fails and the match operator returns the pairs
it already found.
$_ = "1122a44";
my @pairs = m/\G(\d\d)/g; # qw( 11 22 )
You can also use the "\G" anchor in scalar context. You still need the
"g" flag.
$_ = "1122a44";
while( m/\G(\d\d)/g )
{
print "Found $1\n";
}
After the match fails at the letter "a", perl resets pos() and the next
match on the same string starts at the beginning.
$_ = "1122a44";
while( m/\G(\d\d)/g )
{
print "Found $1\n";
}
print "Found $1 after while" if m/(\d\d)/g; # finds "11"
You can disable pos() resets on fail with the "c" flag. Subsequent
matches start where the last successful match ended (the value of pos())
even if a match on the same string as failed in the meantime. In this
case, the match after the while() loop starts at the "a" (where the last
match stopped), and since it does not use any anchor it can skip over
the "a" to find "44".
$_ = "1122a44";
while( m/\G(\d\d)/gc )
{
print "Found $1\n";
}
print "Found $1 after while" if m/(\d\d)/g; # finds "44"
Typically you use the "\G" anchor with the "c" flag when you want to try
a different match if one fails, such as in a tokenizer. Jeffrey Friedl
offers this example which works in 5.004 or later.
while (<>) {
chomp;
PARSER: {
m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; };
m/ \G( \w+ )/gcx && do { print "word: $1\n"; redo; };
m/ \G( \s+ )/gcx && do { print "space: $1\n"; redo; };
m/ \G( [^\w\d]+ )/gcx && do { print "other: $1\n"; redo; };
}
}
For each line, the PARSER loop first tries to match a series of digits
followed by a word boundary. This match has to start at the place the
last match left off (or the beginning of the string on the first match).
Since "m/ \G( \d+\b )/gcx" uses the "c" flag, if the string does not
match that regular expression, perl does not reset pos() and the next
match starts at the same position to try a different pattern.
--------------------------------------------------------------------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.
--
Posted via a free Usenet account from http://www.teranews.com
.
- Prev by Date: Net::SMTP problem
- Next by Date: Re: imagemagick very slow - is there anything better?
- Previous by thread: Net::SMTP problem
- Next by thread: Close a Running Sub-Process
- Index(es):
Relevant Pages
|