RE: speed up string matching

From: C R (nautmann_at_yahoo.dk)
Date: 09/23/04


Date: Thu, 23 Sep 2004 22:46:41 +0200 (CEST)
To: beginners@perl.org

c r <nautmann@yahoo.dk> wrote:Thanks for replying!

Thomas Bätzler <t.baetzler@bringe.com> wrote:
Hi,

c r asked:
> I need to match an expression and its reverse to a very long string.
> When a match occurs all matching should stop and the position
> of the match should be returned.

Could you please illustrate this with an example or two?

Don't mind the lack of position return. I have:

$expres = '10_normal_characters'; $rev_expres = reverse $expres; $long_string = 'ARGBB...........';

if ($long_string =~ /$expres/i) { next;}

if ($long_string =~ /$rev_expres/i) {next;}

(the "next" function takes a different $expres reverses it and does the matching procedure again. This is repeted many thounsands of times and it takes days to finish).

Unless you specify the /g modifier, the RE engine stops at
the first match. Use the pos function to find the position
of the match.

> Question1: can I match the forward and reverse expression to
> the string on the same time and thereby save half the time it
> normally would take to find a match or does the matching just
> get slower?

Well, you'd have to merge your expressions somehow - the easiest
way would be to try and match /expr-a|expr-b/ but then I suspect
that for all but simple cases two separate matches would be faster.

> Question2: is the "fork" function what I should use in order
> to match a string with multiple expressions simultaneously?

Well, there is a certain overhead involved in keeping your processes
synchronized that would only be outweighed if you had a multi CPU
machine where both processes could run at once in the first place.
Even then it's a hassle.

So in order to match a very long string with multiple expressions simultaneously and faster than the matching procedure I have described above I need multiple computers?

If I were you I'd focus my energy in optimizing the expression.
If you're going to match many long strings with the same RE, you
could use the /o modifier to benefit. Also, you could try wether
a study() of the input strings speeds things up.

I don't know the "study" function, but I doubt it can solve the problem to satisfatory. please correct me if I am wrong!

I acknowledge that this is a serious programming challenge. Do you have any other ideas of how to tackle this problem (what about other hardware)?

HTH,
Thomas



Relevant Pages

  • Re: Regular Expression, to use or not to use...
    ... So I have something that will search the string 10x ... Yea thats true, like I said I still use them occasionally, as a hack, ... Also this is an extremly simple re, no |'s or complex expressions. ... >> simple string operations where I can come up with the expression in a ...
    (microsoft.public.dotnet.general)
  • Re: Small confusion about negative lookbehind
    ... > My candidate string is "ab". ... > The expressions I'm testing this string against are the following, ... but the position between characters. ... Regular expressions describe not only strings, ...
    (comp.lang.java.programmer)
  • Re: Regular Expression, to use or not to use...
    ... > Expressions for a long time now, ... > external tools to help me with. ... > order of 10 times slower then straight forward string parsing code. ... This could be a result of .NET engine, ...
    (microsoft.public.dotnet.general)
  • Re: Why not FP for Money?
    ... >> conversion of binary floats to decimal floats, and the string looks ... >> out of place in numeric expressions. ... > that using 'd' is a compromise to having no way to write ... Carlos Ribeiro ...
    (comp.lang.python)
  • Re: Reverse words in a string (Another Interview question)
    ... >> I'm using a linked list to contain words that will be output in reverse ... A stack is exactly that. ... C's string handling sucks. ... > void reverse_string(char * ostr) ...
    (comp.programming)

Loading