Re: why is pattern matching using '|' slower than 2 separate ones?
From: Ilya Zakharevich (nospam-abuse_at_ilyaz.org)
Date: 11/11/04
- Next message: René Scheibe: "sorting multiple arrays"
- Previous message: Andrew Tkachenko: "Re: better way using IO::Select and pool of non-blockig socks"
- In reply to: Dave: "Re: why is pattern matching using '|' slower than 2 separate ones?"
- Next in thread: ctcgag_at_hotmail.com: "Re: why is pattern matching using '|' slower than 2 separate ones?"
- Reply: ctcgag_at_hotmail.com: "Re: why is pattern matching using '|' slower than 2 separate ones?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 11 Nov 2004 01:20:09 +0000 (UTC)
[A complimentary Cc of this posting was sent to
Dave
<daveandniki@ntlworld.com>], who wrote in article <3r4kd.189$Av5.115@newsfe4-gui.ntli.net>:
$a = 'ZZZZZZZZZ'; # Or some such
> > $a=1 if $z=~/xxxxx|yyyyy/;
> > # $a=1 if $z=~/xxxxx/ or $z=~/yyyyy/;
> For the answer to this and more (if you are interested) have a look at
> Mastering Regular Expressions by Jeremy Friedl
Did not see the newer edition. Does it describe the operation of REx
optimizer?
> The short answer is that the behaviour is entirely expected. The first
> version caused the Regex engine to do lots of switching at each position in
> the string to swap between looking for one then the other. It also hinders
> the engine from doing certain optimisations which are easy with the simple
> literal string.
This has little relation to what actually happens. The REx engine
proper is not even entered with these patterns. It is the optimizer
who rejects the match. And with the first version it tries to find
'x' or 'y' inside the string - which is much slower that looking for
'x' at each 5th position - as the second version does.
IMO, it is the "each 5th position" which helps - not switching between
two possibilities.
Run with use re 'debugcolor' for details,
Ilya
- Next message: René Scheibe: "sorting multiple arrays"
- Previous message: Andrew Tkachenko: "Re: better way using IO::Select and pool of non-blockig socks"
- In reply to: Dave: "Re: why is pattern matching using '|' slower than 2 separate ones?"
- Next in thread: ctcgag_at_hotmail.com: "Re: why is pattern matching using '|' slower than 2 separate ones?"
- Reply: ctcgag_at_hotmail.com: "Re: why is pattern matching using '|' slower than 2 separate ones?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|