Re: Small confusion about negative lookbehind



david.karr@xxxxxxxx writes:

> I'm writing a small test program to illustrate several aspects of
> regular expressions. In the section illustrating "lookaround"s, I
> found something I didn't understand. My testing is with JDK 1.4.2.

Hey, I didn't even know about look-behinds :)

> My candidate string is "ab".
>
> The expressions I'm testing this string against are the following,
> which also lists whether the string matched or not
....
> Looking at these, I first wonder what exactly is the semantic
> difference between a "lookbehind" and "lookahead" construct.

Both are zero-width predicates, which means (kindof) that it matches
not a character, but the position between characters. See a string
as not just a sequence of characters, but of alternating characters
and in-between positions. These positions are where the cursor is
when you write (if you use a bar cursor, not a block, obviously :).

Regular expressions describe not only strings, but also the positions
between the chars in strings, e.g. "\b" which matches a position which
is at a word boundary (word-charater on one side, non-word-character
on the other). The look-around patters work just the same.

The exact predicate determines how the position is matched. For a
look-ahead, the zero-width position is matched if the following
characters is matched by the look-ahead expression. For the
look-behind, the zero-width position is matched if the previous
characters match the look-behind expression.

So, "a(?=b)" matches an "a" followed by a zero-width string which is
followed by a "b". The matched substring of "ab" is "a".

"(?=a)b" matches a zero-width string which is followed by an "a",
followed by a "b". Since no position can be followed by both an "a"
and a "b", no string will match.

"(?<=a)b" matches a zero-width string preceeded by an "a", followed
by a "b". The matched substring of "ab" is "b".

"a(?<=b)" matches an "a" followed by a zero-width string preceeded by
a "b". Since that's not possible for any string, it fails.

> (?<!x)b // succeeds

"(?<!x)b" matches a zero-width string not preceeded by an "x",
followed by a "b". The matched substring of "ab" is "b".

> a(?<!x) // succeeds(!)

"a(?<!x)" matches an "a" followed by a zero-width string not preceeded
by an "x". This matches the string "a", even as a substring of "ab".

/L
--
Lasse Reichstein Nielsen - lrn@xxxxxxxxxx
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
.



Relevant Pages

  • Re: Get text "literally" from a TextBox
    ... Cor and Patrice, thanks for the answer; I know the regular expressions, but ... my problem is how get the pattern string if the user put that in a Textbox. ... maybe I can depure my string, but exist another especial "characters" like ...
    (microsoft.public.dotnet.languages.vb)
  • Re: Regular Expression
    ... not any other characters. ... expressions work, not just get the darn thing done. ... i love regular expressions:) they're nifty things. ...
    (comp.lang.java.programmer)
  • Re: JavaScript to validate User input
    ... I need to write a Java Script for a string payment_code which comes ... If a user enters characters other than the mentioned above, ... Calulate the length of the string variable ls_tmp_string and store ... Or buy the great book 'Mastering regular expressions' by O'Reilly. ...
    (comp.lang.javascript)
  • Re: regex/replace white list
    ... It removes the problem that Regular Expressions ... cannot span lines because string concatenation serves the purpose. ... RegExp special characters parsed as expression atoms instead. ... The latter introduces the maintenance problem that the ...
    (comp.lang.javascript)
  • Re: strings vs regular expressions
    ... Just to clear things up regarding regular expressions versus string ... characters in a string, which may be different characters in the same ... pattern, and string functions for looking for substrings. ...
    (microsoft.public.dotnet.framework)