Re: need help with regex

From: Alan Moore (jbigboote_at_yoyodyne.com)
Date: 02/08/05


Date: Tue, 08 Feb 2005 11:42:11 -0800

On Tue, 08 Feb 2005 18:35:05 +0800, - <nobody@hoem.om> wrote:

>i have a sample text:
>"Key: Value Key: Value2 Key: Value3 Subkey: apple Subkey: orange Key:
>Value 4"
>
>i need to extract:
>"Subkey: apple Subkey: orange"
>
>i have a regex expression:
>p.compile("Key: Value2\\s.*(Subkey:\\s.*)Key:");
>
>which only succeeds in extracting "Subkey: orange"
>
>i am pretty sure the solution is to repeat the portion using
>(Subkey:\\s.*)* <--- extra asterisk.

The problem with your regex is that the first ".*" originally matches
all the way to the end of the line. Then the regex engine has to
backtrack in order to match the rest of the pattern--but it only
backtracks as far as it has to, i.e., to the *last* occurrence of
"Subkey:". Adding the asterisk where you suggested only makes things
worse, because now it doesn't have to match the parenthesized
expression even once.

The simplest solution is to make the first ".*" non-greedy: ".*?".
You do need to add a quantifier to the subexpression, but just tacking
on another asterisk is a bad idea. Whenever you have a regex of the
form (x*)*, you run the risk that the regex will take forever to
report failure. I suggest you modify the subexpression so that it
doesn't rely on backtracking. Assuming the Subkey values can't
contain spaces, this should work:

  "Key: Value2\\s.*?((?:\\sSubkey:\\s\\S++)+)"

Notice that I also had to match the space preceding the Subkey value
in order for the quantifier to work. I also used a possessive plus
inside the subexpression to avoid the neverending nonmatch problem,
although in this case it isn't really necessary.



Relevant Pages

  • Re: best design for parse
    ... Dim _regex As New ... Although the application does not exactly know before hand what format the ... format and identifier I can use regex,replace to normalize the date. ... relevant regex expression to be used for date normalization later in part ...
    (microsoft.public.dotnet.languages.vb)
  • Re: RegEx problem
    ... A quick test with a loop and two timestamps will show you why! ... Regex can do beautiful things, but isn't the best tool for every problem. ... I'm not sure if a int.TryParse would impact the loop you tried enough to make is slower than a regex though, my guess is that it's still faster than a regex. ... I think it must be the last part of my regex expression. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: best design for parse
    ... 1.copy the date format string regex string holder and then derive the ... relevant regex expression to be used for date normalization later in part 2: ...
    (microsoft.public.dotnet.languages.vb)
  • Re: RegEx problem
    ... "Jesse Houwing" wrote: ... I think it must be the last part of my regex expression. ... repeat optional group if possible ...
    (microsoft.public.dotnet.languages.csharp)
  • Return Data Regex Doesnt Isolate - Yikes
    ... I'm having a bad regex day and can sure use your help, ... I have a Regex expression that works fine. ... data from the start of a string begining with 200~ to the end of the string ... Here's some test data ...
    (microsoft.public.dotnet.languages.csharp)