Re: Speed comparison of regex versus index, lc, and / /i



Ben Bullock wrote:
On Sat, 31 May 2008 01:49:14 +0000, A. Sinan Unur wrote:

There is one change I would make to both routines. Cache length $ss
before the loop. On my machine, that cut down the time by about 40%.

In the real situation almost every search for the search string ($ss)
in the text ($text) is going to fail, so I think it would make much
less difference than in the fake situation of repeating with the same
string over and over again.

I modified the program to use /o that A. Sinan Unur added, and also
added a routine to test Ben Morrow's idea and reran it using the
Benchmark module:

#!/usr/local/bin/perl
use warnings;
use strict;

my $text = <<EOF;
xhoster is the coolest perl programmer ever. xhoster is the
greatest. xhoster is the champion. xhoster is a babe magnet.
EOF
my $ss = "xhoster";

sub index_find
{
my @finds;
my $found = 0;
while (($found = index ($text, $ss, $found)) != -1) {
push @finds, $found;
$found += length ($ss);
}
return \@finds;
}

On my machine and version of Perl a get a speed improvement by using a C style for loop instead:

sub index_find
{
my @finds;
for ( my $found = 0; ( $found = index( $text, $ss, $found ) ) != -1; $found += length $ss ) {
push @finds, $found;
}
return \@finds;
}

sub regex_find
{
my @finds;
while ($text =~ /\Q$ss\E/g) {
push @finds, pos ($text) - length($ss);
}
return \@finds;
}

sub regex_unur_find
{
my @finds;
while ($text =~ /\Q$ss/go) {
push @finds, pos ($text) - length($ss);
}
return \@finds;
}

my $re = qr{\Q$ss};

sub regex_morrow_find
{
my @finds;
while ($text =~ /$re/g) {
push @finds, pos ($text) - length($ss);
}
return \@finds;
}

# check it all works

for (\&index_find, \&regex_find, \&regex_unur_find, \&regex_morrow_find) {
print "String found at ", (join ", ",@{&{$_}($text, $ss)}),"\n";
}

use Benchmark qw( cmpthese );

my $count = 100000;

cmpthese $count, {
'index' => sub {index_find($text,$ss)},
'regex' => sub {regex_find($text,$ss)},
'regex_unur' => sub {regex_unur_find($text,$ss)},
'regex_morrow' => sub {regex_morrow_find($text,$ss)},
};

Here are the results of five runs:

Rate regex_morrow regex regex_unur index
regex_morrow 116279/s -- -31% -40% -41%
regex 169492/s 46% -- -12% -14%
regex_unur 192308/s 65% 13% -- -2%
index 196078/s 69% 16% 2% --

On my machine and version of Perl regex_unur_find() and regex_morrow_find() are statically the same;

Rate regex regex_unur regex_morrow index_while index_for
regex 144442/s -- -5% -6% -16% -17%
regex_unur 151952/s 5% -- -1% -11% -13%
regex_morrow 154059/s 7% 1% -- -10% -12%
index_while 171325/s 19% 13% 11% -- -2%
index_for 174905/s 21% 15% 14% 2% --



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
.



Relevant Pages

  • Re: Speed comparison of regex versus index, lc, and / /i
    ... sub index_find ... push @finds, pos - length; ... xhoster is the coolest perl programmer ever. ... use Benchmark qw(cmpthese); ...
    (comp.lang.perl.misc)
  • Re: Speed comparison of regex versus index, lc, and / /i
    ... In the real situation almost every search for the search string ... xhoster is the coolest perl programmer ever. ... sub index_find ... push @finds, pos - length; ...
    (comp.lang.perl.misc)
  • Re: Speed comparison of regex versus index, lc, and / /i
    ... about special characters or syntax errors in the regex, ... sub index_find ... push @finds, pos - length; ... xhoster is the coolest perl programmer ever. ...
    (comp.lang.perl.misc)
  • RE: Email and Fax a Report
    ... Opening a report in a loop seems odd, ... By placing the sub on its own, ... Dim objOutlook As Outlook.Application ... Set newMail = objOutlook.CreateItem ...
    (microsoft.public.access.modulesdaovba)
  • Re: arrays verodern
    ... interessant wenn SUB ... unnötige Bytes in der Schleife hat, ... welches LOOP, ... mov ecx, CCount ...
    (de.comp.lang.delphi.misc)