Re: regular expression reverse match?

From: Ron Adam (radam2_at_tampabay.rr.com)
Date: 10/30/03


Date: Wed, 29 Oct 2003 23:30:36 GMT

On Wed, 29 Oct 2003 11:32:45 +0100, "Diez B. Roggisch"
<nospam-deets@web.de> wrote:

>Hi,
>
>> As it is, the program checks the input buffer after each keystroke and
>> determines if the buffer is acceptable against a pattern as the string
>> is being built. A reverse match. It allows new keystrokes to be
>> added to the buffer as long as it's within the pattern.
>
>I don't think its possible with regular expressions, or at least the way you
>are using them right now, that is to come from a "in the end, I want to
>look it like this, but it shall accept everything that prefixes a valid
>result".
>
>That won't work - the reason is simply that regular expressions are
>equivalent to finite state automata. I don't know if you are familiar with
>these, but the consist of states, which are nodes, and transitions between
>these, which are edges/arrows between the states.

I'm not familiar with the terms, but I am familiar with tree data
structures.

[clipped good explanation]

>However, its way more complicated to create this as a rex:
>
>"a(b(c)?)?"
>
>This looks straightforward, so you might be successful to create rexes like
>this using a preprocessing step - but the more complicated your rex gets,
>this approach will be hard to follow. Actually, I have currently no idea.
>Which doesn't mean its not doable, I just don't have enough time to think
>about it right now :)

I'm still definitely learning this, and appreciate the help.

In reply to Emily, I compared rex to simplifying a math problem like
this. I'm hoping this will lead to a method that will work.

> if this were a math problem it might look something like this.
>
> string <= yes$ | no$
> string = (y | ye | yes)$ | (n | no)$
> string = (y$ | ye$ | yes$) | (n$ | no$)
> string = y$ | ye$ | yes$ | n$ | no$

Using the same approach for the example above might be something like:

s <= a( b (c)? )?
s <= a( b | bc )?
s <= a | ab | abc
s = a | (a | ab) | ( a | ab | abc)
s = a | a | ab | a | ab | abc
s = a | ab | abc

I have no idea if what I'm doing here is actually valid. I'm sort of
thinking it out and learning as I go. Are there rules and methods to
manipulating regular expressions in this manner, rex algebra? Or is
this a subset of set theory? I think my statistics instructor did
something similar to this. It was a few years ago.

>It looks to me, that if you need this feature not on a per-case base where
>you can think about your rexes more thoroughly, you have to sort of roll
>out your own rex-implmenetation. Which isn't too hard, but definitely
>something thats more than just a couple of lines.
>
>Regards,
>
>Diez

Looks like I only <obvious understatement> need to create a few
functions to manipulate rex expressions. Simpler than a full
rex-emplementations, but still definitely more than a few lines of
code.

_Ron Adam



Relevant Pages

  • Re: "secure" file flag?
    ... necessary to flush the data through the drive cache. ... onto the platters after you have applied each pattern. ... > buffer contents is prepared for the next step of the erasure process, ... to the conclusion disk encryption is probably a lower-cost solution. ...
    (freebsd-hackers)
  • Re: RegEx partial matching
    ... If you don't want your buffer to be too big, ... you can safely discard the lower 10 characters. ... > like to be safe and just keep 3x the length of the pattern I'm looking ... I want to match full featured regular expressions not just fixed ...
    (comp.lang.java.programmer)
  • [newbie] _lfind syntax problem
    ... UINT bufferlen; ... BYTE *pattern; //holds pointer to pattern to search for ... Now, I've decided to use _lfind for this searching (I'm not using bsearch, ...
    (microsoft.public.vc.mfc)
  • [newbie] _lfind syntax problem
    ... UINT bufferlen; ... BYTE *pattern; //holds pointer to pattern to search for ... Now, I've decided to use _lfind for this searching (I'm not using bsearch, ...
    (microsoft.public.dotnet.languages.vc)
  • [newbie] _lfind syntax problem
    ... UINT bufferlen; ... BYTE *pattern; //holds pointer to pattern to search for ... Now, I've decided to use _lfind for this searching (I'm not using bsearch, ...
    (microsoft.public.vc.language)