Re: FAQ 6.11 Can I use Perl regular expressions to match balanced text?
- From: espie@xxxxxxxxx (Marc Espie)
- Date: Mon, 31 Dec 2007 14:49:31 +0000 (UTC)
In article <m7vl45-pe7.ln1@xxxxxxxxxxxxxxxxxxx>,
PerlFAQ Server <brian@xxxxxxxxxxxxxx> wrote:
6.11: Can I use Perl regular expressions to match balanced text?
Historically, Perl regular expressions were not capable of matching
balanced text. As of more recent versions of perl including 5.6.1
experimental features have been added that make it possible to do this.
Look at the documentation for the (??{ }) construct in recent perlre
manual pages to see an example of matching balanced parentheses. Be sure
to take special notice of the warnings present in the manual before
making use of this feature.
I've been using the following trick for many practical applications.
It helps to know why regexps cannot match balanced texts. Specifically,
normal regexps match rational languages, and rational languages cannot
count to an arbitrary number.
Balanced texts are algebraic languages, and they can be generated by
a grammar. The simplest example, with just the parentheses, would be:
D -> (D)D | <empty>
This is just one possible grammar, this one is non-ambiguous, which means
less combinatorial explosion later on.
However, one can use this grammar to build a good approximation of it as
a regular expression.
Specifically, set D0 = 0
then build Dn = (D_{n-1})D_{n-1} | empty
In perl terms:
$e = "";
for (1 .. n) {
$e = "(?:\($e\)$e)|";
}
then Dn is the rational language of all balanced expressions that do not
nest more than n open parentheses.
This can be used to build a regexp that matches all balanced expressions up
to an set level of nesting parentheses automatically. In many cases,
you don't have to go higher than n=15 or 20...
.
- References:
- FAQ 6.11 Can I use Perl regular expressions to match balanced text?
- From: PerlFAQ Server
- FAQ 6.11 Can I use Perl regular expressions to match balanced text?
- Prev by Date: Re: s/A/B/ and s/B/C/ but don't want A -> C
- Next by Date: Propblems with pp
- Previous by thread: FAQ 6.11 Can I use Perl regular expressions to match balanced text?
- Next by thread: FAQ 6.9 What is "/o" really for?
- Index(es):
Relevant Pages
|