Re: Speed comparison of regex versus index, lc, and / /i
- From: "John W. Krahn" <someone@xxxxxxxxxxx>
- Date: Sat, 31 May 2008 07:19:24 GMT
Ben Bullock wrote:
On Sat, 31 May 2008 01:49:14 +0000, A. Sinan Unur wrote:
There is one change I would make to both routines. Cache length $ss
before the loop. On my machine, that cut down the time by about 40%.
In the real situation almost every search for the search string ($ss)
in the text ($text) is going to fail, so I think it would make much
less difference than in the fake situation of repeating with the same
string over and over again.
I modified the program to use /o that A. Sinan Unur added, and also
added a routine to test Ben Morrow's idea and reran it using the
Benchmark module:
#!/usr/local/bin/perl
use warnings;
use strict;
my $text = <<EOF;
xhoster is the coolest perl programmer ever. xhoster is the
greatest. xhoster is the champion. xhoster is a babe magnet.
EOF
my $ss = "xhoster";
sub index_find
{
my @finds;
my $found = 0;
while (($found = index ($text, $ss, $found)) != -1) {
push @finds, $found;
$found += length ($ss);
}
return \@finds;
}
On my machine and version of Perl a get a speed improvement by using a C style for loop instead:
sub index_find
{
my @finds;
for ( my $found = 0; ( $found = index( $text, $ss, $found ) ) != -1; $found += length $ss ) {
push @finds, $found;
}
return \@finds;
}
sub regex_find
{
my @finds;
while ($text =~ /\Q$ss\E/g) {
push @finds, pos ($text) - length($ss);
}
return \@finds;
}
sub regex_unur_find
{
my @finds;
while ($text =~ /\Q$ss/go) {
push @finds, pos ($text) - length($ss);
}
return \@finds;
}
my $re = qr{\Q$ss};
sub regex_morrow_find
{
my @finds;
while ($text =~ /$re/g) {
push @finds, pos ($text) - length($ss);
}
return \@finds;
}
# check it all works
for (\&index_find, \®ex_find, \®ex_unur_find, \®ex_morrow_find) {
print "String found at ", (join ", ",@{&{$_}($text, $ss)}),"\n";
}
use Benchmark qw( cmpthese );
my $count = 100000;
cmpthese $count, {
'index' => sub {index_find($text,$ss)},
'regex' => sub {regex_find($text,$ss)},
'regex_unur' => sub {regex_unur_find($text,$ss)},
'regex_morrow' => sub {regex_morrow_find($text,$ss)},
};
Here are the results of five runs:
Rate regex_morrow regex regex_unur index
regex_morrow 116279/s -- -31% -40% -41%
regex 169492/s 46% -- -12% -14%
regex_unur 192308/s 65% 13% -- -2%
index 196078/s 69% 16% 2% --
On my machine and version of Perl regex_unur_find() and regex_morrow_find() are statically the same;
Rate regex regex_unur regex_morrow index_while index_for
regex 144442/s -- -5% -6% -16% -17%
regex_unur 151952/s 5% -- -1% -11% -13%
regex_morrow 154059/s 7% 1% -- -10% -12%
index_while 171325/s 19% 13% 11% -- -2%
index_for 174905/s 21% 15% 14% 2% --
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
.
- Follow-Ups:
- Re: Speed comparison of regex versus index, lc, and / /i
- From: Ben Bullock
- Re: Speed comparison of regex versus index, lc, and / /i
- References:
- Speed comparison of regex versus index, lc, and / /i
- From: Ben Bullock
- Re: Speed comparison of regex versus index, lc, and / /i
- From: xhoster
- Re: Speed comparison of regex versus index, lc, and / /i
- From: Ben Bullock
- Re: Speed comparison of regex versus index, lc, and / /i
- From: A. Sinan Unur
- Re: Speed comparison of regex versus index, lc, and / /i
- From: Ben Bullock
- Speed comparison of regex versus index, lc, and / /i
- Prev by Date: FAQ 4.64 How can I get the unique keys from two hashes?
- Next by Date: Re: FAQ 4.17 How do I find yesterday's date?
- Previous by thread: Re: Speed comparison of regex versus index, lc, and / /i
- Next by thread: Re: Speed comparison of regex versus index, lc, and / /i
- Index(es):
Relevant Pages
|
|