Re: Extracting substring with regexp



Alex wrote:
Pattern p = Pattern.compile("abc(.*)xyz");
Matcher m = p.matcher("xxxxxabc123xyz789xyzxxxxx");
if (m.find())System.out.println(m.group(1));

should print "123" but, instead, it prints "123xyz789".
How can I force regexp to find first match?

Short answer: By default, matching will take the longest matching group. Use "abc(.*?)xyz" instead.

Long answer: The *, +, and ? operators (unqualified) match by first assuming that the match continues and then backtrack until they fail. The `?' operator, when concatenated, will override that behavior by first trying to match without applying the operator and then applying it. The `+' operator will also override the behavior by prohibiting backtracking.

"(a*"+operator+")a" on the string "aaa", group 1 matches:
"": aa
"?": a
"+": <failure>

--
Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
.