Re: why is pattern matching using '|' slower than 2 separate ones?

From: Ilya Zakharevich (nospam-abuse_at_ilyaz.org)
Date: 11/11/04


Date: Thu, 11 Nov 2004 01:20:09 +0000 (UTC)


[A complimentary Cc of this posting was sent to
Dave
<daveandniki@ntlworld.com>], who wrote in article <3r4kd.189$Av5.115@newsfe4-gui.ntli.net>:

$a = 'ZZZZZZZZZ'; # Or some such

> > $a=1 if $z=~/xxxxx|yyyyy/;
> > # $a=1 if $z=~/xxxxx/ or $z=~/yyyyy/;

> For the answer to this and more (if you are interested) have a look at
> Mastering Regular Expressions by Jeremy Friedl

Did not see the newer edition. Does it describe the operation of REx
optimizer?

> The short answer is that the behaviour is entirely expected. The first
> version caused the Regex engine to do lots of switching at each position in
> the string to swap between looking for one then the other. It also hinders
> the engine from doing certain optimisations which are easy with the simple
> literal string.

This has little relation to what actually happens. The REx engine
proper is not even entered with these patterns. It is the optimizer
who rejects the match. And with the first version it tries to find
'x' or 'y' inside the string - which is much slower that looking for
'x' at each 5th position - as the second version does.

IMO, it is the "each 5th position" which helps - not switching between
two possibilities.

Run with use re 'debugcolor' for details,
Ilya



Relevant Pages

  • Re: How does this work?
    ... Let's say the string value is ... The value of the first argument is the substring that matched. ... the match extends as long as the delimiter characters don't match. ... So, as you can probably see now, it is actually a simple template engine ...
    (comp.lang.javascript)
  • Re: Static Class Constants (VB 2005)
    ... It depends on the optimizer. ... string should be caught and converted to a constant at compile time. ... > Exact this answer was what I expected and than I find a read-only property ...
    (microsoft.public.dotnet.languages.vb)
  • Re: sending strings from C using Engine
    ... I have a simple program in C that calls the MATLAB engine ... The string reads "testing 20 strings" and I ... variables to mxArrays and then put those into the engine workspace. ...
    (comp.soft-sys.matlab)
  • i have problem that two thread use one charactor pointer for string buffer
    ... my project has critical bug that TTS(text to speech) engine has two thread ... the received string is dynamically allocated ... i think the bug occured when socket thread receives string buffer ...
    (comp.lang.cpp)
  • Re: [EGN] Variable hoisting
    ... > about programming such as inappropriate variable hoisting. ... I just did a simple program that used your "strleninside the loop ... silliness and did test runs with optimizer disabled, ... even with a relatively short string. ...
    (comp.programming)