Re: Small confusion about negative lookbehind
- From: Lasse Reichstein Nielsen <lrn@xxxxxxxxxx>
- Date: Tue, 31 May 2005 01:19:26 +0200
david.karr@xxxxxxxx writes:
> I'm writing a small test program to illustrate several aspects of
> regular expressions. In the section illustrating "lookaround"s, I
> found something I didn't understand. My testing is with JDK 1.4.2.
Hey, I didn't even know about look-behinds :)
> My candidate string is "ab".
>
> The expressions I'm testing this string against are the following,
> which also lists whether the string matched or not
....
> Looking at these, I first wonder what exactly is the semantic
> difference between a "lookbehind" and "lookahead" construct.
Both are zero-width predicates, which means (kindof) that it matches
not a character, but the position between characters. See a string
as not just a sequence of characters, but of alternating characters
and in-between positions. These positions are where the cursor is
when you write (if you use a bar cursor, not a block, obviously :).
Regular expressions describe not only strings, but also the positions
between the chars in strings, e.g. "\b" which matches a position which
is at a word boundary (word-charater on one side, non-word-character
on the other). The look-around patters work just the same.
The exact predicate determines how the position is matched. For a
look-ahead, the zero-width position is matched if the following
characters is matched by the look-ahead expression. For the
look-behind, the zero-width position is matched if the previous
characters match the look-behind expression.
So, "a(?=b)" matches an "a" followed by a zero-width string which is
followed by a "b". The matched substring of "ab" is "a".
"(?=a)b" matches a zero-width string which is followed by an "a",
followed by a "b". Since no position can be followed by both an "a"
and a "b", no string will match.
"(?<=a)b" matches a zero-width string preceeded by an "a", followed
by a "b". The matched substring of "ab" is "b".
"a(?<=b)" matches an "a" followed by a zero-width string preceeded by
a "b". Since that's not possible for any string, it fails.
> (?<!x)b // succeeds
"(?<!x)b" matches a zero-width string not preceeded by an "x",
followed by a "b". The matched substring of "ab" is "b".
> a(?<!x) // succeeds(!)
"a(?<!x)" matches an "a" followed by a zero-width string not preceeded
by an "x". This matches the string "a", even as a substring of "ab".
/L
--
Lasse Reichstein Nielsen - lrn@xxxxxxxxxx
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
.
- References:
- Small confusion about negative lookbehind
- From: david . karr
- Small confusion about negative lookbehind
- Prev by Date: Small confusion about negative lookbehind
- Next by Date: Re: Underline in text field
- Previous by thread: Small confusion about negative lookbehind
- Next by thread: Re: Small confusion about negative lookbehind
- Index(es):
Relevant Pages
|
|