Re: buggy regexp
- From: ilyabo@xxxxxxxxx
- Date: 31 May 2007 10:43:25 -0700
Hello Robert,
You nest "+" and "*" which can lead to bad backtracking (which you seem
to experience). If you wait long enough you'll see the result.
You are right it doesn't hang, it just works very slowly - it takes
about two minutes to finish this match. And the more "a"-letters
before the <BR>s are in the input string, the longer it takes.
If you just want to test for the presence of "<BR>" you can do this:
boolean match = str.indexOf("<BR>") != -1;
Or as regexp
Pattern p = Pattern.compile("<BR>");
boolean match = p.matcher(s).find();
I want actually to check the validness of an HTML string where only
the cerain tags are allowed. The original expression was:
(?ui)(([^<>]+)|(<\\/?(p|br|em|strong|strike|i|b|ul|ol|li)\\s*\\/?\
\s*>))*
but I reduced it to make the problem clearer.
Interestingly, the same regular expression match works instantly in
Perl:
$_ = "aaaaaaaaaaaaaaaaaaaaaaaa <BR><Bx>";
print "yes" if (m/^((<BR>)|([^<>]+))*$/);
And if I move <Bx> to the beginning of the string or change it to <BR>
then it works instantly also in Java.
Ilya
.
- References:
- buggy regexp
- From: ilyabo
- Re: buggy regexp
- From: Robert Klemme
- buggy regexp
- Prev by Date: Re: JSP Loading Page... Please wait while my code doesn't work!
- Next by Date: Re: compiler
- Previous by thread: Re: buggy regexp
- Next by thread: Re: buggy regexp
- Index(es):